From martin at v.loewis.de  Thu May  1 00:50:43 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 00:50:43 +0200
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <20080430144804.GA26439@panix.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com>
Message-ID: <4818F7C3.7060806@v.loewis.de>

> There's a big difference between "not enough memory" and "directory
> consumes lots of memory".  My company has some directories with several
> hundred thousand entries, so using an iterator would be appreciated
> (although by the time we upgrade to Python 3.x, we probably will have
> fixed that architecture).
> 
> But even then, we're talking tens of megabytes at worst, so it's not a
> killer -- just painful.

But what kind of operation do you want to perform on that directory?

I would expect that usually, you either

a) refer to a single file, which you are either going to create, or
   want to process. In that case, you know the name in advance, so
   you open/stat/mkdir/unlink/rmdir the file, without caring how
   many files exist in the directory,
or

b) need to process all files, to count/sum/backup/remove them;
   in this case, you will need the entire list in the process,
   and reading them one-by-one is likely going to slow down
   the entire operation, instead of speeding it up.

So in no case, you actually need to read the entries incrementally.

That the C APIs provide chunk-wise processing is just because
dynamic memory management is so painful to write in C that the
caller is just asked to pass a limited-size output buffer, which then
gets refilled in subsequent read calls. Originally, the APIs would
return a single entry at a time from the file system, which was
super-slow. Today, SysV all-singing all-dancing getdents provides
multiple entries at a time, for performance reasons.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Thu May  1 00:49:23 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 01 May 2008 10:49:23 +1200
Subject: [Python-3000] range() issues
In-Reply-To: <ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>
	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>
	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>
	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com>
	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
Message-ID: <4818F773.4060809@canterbury.ac.nz>

Guido van Rossum wrote:
> I would like to see the following:
> 
> - sq_length should return maxsize if the actual value doesn't fit

So that code will silently behave as though the rest of
the sequence wasn't there some of the time?

Can you elaborate on the rationale for this? I'm having
trouble seeing how it's a good idea.

-- 
Greg

From guido at python.org  Thu May  1 01:00:25 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Apr 2008 16:00:25 -0700
Subject: [Python-3000] range() issues
In-Reply-To: <4818F773.4060809@canterbury.ac.nz>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>
	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com>
	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
Message-ID: <ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>

On Wed, Apr 30, 2008 at 3:49 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> > I would like to see the following:
> >
> > - sq_length should return maxsize if the actual value doesn't fit
> >
>
>  So that code will silently behave as though the rest of
>  the sequence wasn't there some of the time?

Only if it uses LBYL.

>  Can you elaborate on the rationale for this? I'm having
>  trouble seeing how it's a good idea.

Ask the designers of the Java collections package.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu May  1 01:02:31 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Apr 2008 16:02:31 -0700
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <4818F7C3.7060806@v.loewis.de>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
Message-ID: <ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>

There is one use case I can see for an iterator-version of
os.listdir() (to be named os.opendir()): when globbing a huge
directory looking for a certain pattern. Using os.listdir() you end up
needed enough memory to hold all of the names at once. Using
os.opendir() you would need only enough memory to hold all of the
names THAT MATCH.

On Wed, Apr 30, 2008 at 3:50 PM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > There's a big difference between "not enough memory" and "directory
>  > consumes lots of memory".  My company has some directories with several
>  > hundred thousand entries, so using an iterator would be appreciated
>  > (although by the time we upgrade to Python 3.x, we probably will have
>  > fixed that architecture).
>  >
>  > But even then, we're talking tens of megabytes at worst, so it's not a
>  > killer -- just painful.
>
>  But what kind of operation do you want to perform on that directory?
>
>  I would expect that usually, you either
>
>  a) refer to a single file, which you are either going to create, or
>    want to process. In that case, you know the name in advance, so
>    you open/stat/mkdir/unlink/rmdir the file, without caring how
>    many files exist in the directory,
>  or
>
>  b) need to process all files, to count/sum/backup/remove them;
>    in this case, you will need the entire list in the process,
>    and reading them one-by-one is likely going to slow down
>    the entire operation, instead of speeding it up.
>
>  So in no case, you actually need to read the entries incrementally.
>
>  That the C APIs provide chunk-wise processing is just because
>  dynamic memory management is so painful to write in C that the
>  caller is just asked to pass a limited-size output buffer, which then
>  gets refilled in subsequent read calls. Originally, the APIs would
>  return a single entry at a time from the file system, which was
>  super-slow. Today, SysV all-singing all-dancing getdents provides
>  multiple entries at a time, for performance reasons.
>
>  Regards,
>  Martin
>
>
> _______________________________________________
>  Python-3000 mailing list
>  Python-3000 at python.org
>  http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Thu May  1 01:11:23 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 01 May 2008 09:11:23 +1000
Subject: [Python-3000] range() issues
In-Reply-To: <4818F773.4060809@canterbury.ac.nz>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>	<1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com>	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>	<1209582854.1924.7.camel@qrnik>	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
Message-ID: <4818FC9B.1080809@gmail.com>

Greg Ewing wrote:
> Guido van Rossum wrote:
>> I would like to see the following:
>>
>> - sq_length should return maxsize if the actual value doesn't fit
> 
> So that code will silently behave as though the rest of
> the sequence wasn't there some of the time?
> 
> Can you elaborate on the rationale for this? I'm having
> trouble seeing how it's a good idea.
> 

Yeah, it sounds more like behaviour I would expect from __length_hint__, 
not __length__.

In the bug tracker, Alexander mentioned the possibility of removing 
__length__ and __getitem__ support from range() objects in py3k, and 
implementing only __length_hint__ instead (leaving range() as a 
bare-bones iterable). I'm starting to like that idea more and more.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Thu May  1 01:14:08 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 01 May 2008 11:14:08 +1200
Subject: [Python-3000] range() issues
In-Reply-To: <ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>
	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com>
	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
	<ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>
Message-ID: <4818FD40.5010701@canterbury.ac.nz>

Guido van Rossum wrote:
> On Wed, Apr 30, 2008 at 3:49 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> 
>> So that code will silently behave as though the rest of
>> the sequence wasn't there some of the time?
> 
> Only if it uses LBYL.

I don't understand that. Iteration isn't the only thing
one does with sequences. If you have a reason to call
len() in the first place, I don't see how having it
sometimes return inaccurate results can be helpful.

>> Can you elaborate on the rationale for this?

> Ask the designers of the Java collections package.

Do you mean that they have a rationale which you agree
with and think applies to Python as well, or do you
mean that you're doing it just because Java does it
and they must have a good reason?

If the former, can you refer me to a document which
espouses it?

-- 
Greg


From guido at python.org  Thu May  1 01:36:24 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Apr 2008 16:36:24 -0700
Subject: [Python-3000] range() issues
In-Reply-To: <4818FC9B.1080809@gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com>
	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz> <4818FC9B.1080809@gmail.com>
Message-ID: <ca471dc20804301636m692fa43bq83e33758c824de3b@mail.gmail.com>

On Wed, Apr 30, 2008 at 4:11 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>  In the bug tracker, Alexander mentioned the possibility of removing
> __length__ and __getitem__ support from range() objects in py3k, and
> implementing only __length_hint__ instead (leaving range() as a bare-bones
> iterable). I'm starting to like that idea more and more.

Indeed. Do check if it breaks anything though (and how serious the breakage is).

Also note that __bool__ for a range should probably remain implemented
-- True for a non-empty range, False for an empty one.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu May  1 01:41:22 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Apr 2008 16:41:22 -0700
Subject: [Python-3000] range() issues
In-Reply-To: <4818FD40.5010701@canterbury.ac.nz>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com>
	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
	<ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>
	<4818FD40.5010701@canterbury.ac.nz>
Message-ID: <ca471dc20804301641h7eaa220fk670baa3c916adff1@mail.gmail.com>

On Wed, Apr 30, 2008 at 4:14 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
> >
> > On Wed, Apr 30, 2008 at 3:49 PM, Greg Ewing <greg.ewing at canterbury.ac.nz>
> wrote:
> >
> >
> >
> > > So that code will silently behave as though the rest of
> > > the sequence wasn't there some of the time?
> > >
> >
> > Only if it uses LBYL.
> >
>
>  I don't understand that. Iteration isn't the only thing
>  one does with sequences. If you have a reason to call
>  len() in the first place, I don't see how having it
>  sometimes return inaccurate results can be helpful.

I've come across situations where len() raising an exception was more
inconvenient than returning a truncated value (e.g. when printing).

> > > Can you elaborate on the rationale for this?
> > >
> >
>
>
> > Ask the designers of the Java collections package.
> >
>
>  Do you mean that they have a rationale which you agree
>  with and think applies to Python as well, or do you
>  mean that you're doing it just because Java does it
>  and they must have a good reason?
>
>  If the former, can you refer me to a document which
>  espouses it?

You'll have to do some research, but I believe the circumstances are
similar -- they have a size() method that is defined to return an
unboxed int, so they are limited by that.

I found the spec here:

http://java.sun.com/j2se/1.4.2/docs/api/java/util/Collection.html#size()

But I didn't find a rationale. I'm sure it was PBP though.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Thu May  1 02:02:22 2008
From: brett at python.org (Brett Cannon)
Date: Wed, 30 Apr 2008 17:02:22 -0700
Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <fvabi2$q2m$1@ger.gmane.org>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1>
	<bbaeab100804291344g1ca48af9s3b5bbdacf516b8d7@mail.gmail.com>
	<fvabi2$q2m$1@ger.gmane.org>
Message-ID: <bbaeab100804301702x68933a7fi46372f3205d8c98b@mail.gmail.com>

On Tue, Apr 29, 2008 at 11:33 PM, Joe Smith <unknown_kev_cat at hotmail.com> wrote:
>
>  "Brett Cannon" <brett at python.org> wrote in message
> news:bbaeab100804291344g1ca48af9s3b5bbdacf516b8d7 at mail.gmail.com...
>
>
>
> > On Tue, Apr 29, 2008 at 2:46 AM, Raymond Hettinger <python at rcn.com> wrote:
> >
> > >
> > > > * UserList/UserString [done: 3.0]
> > > >
> > >
> > >  Note that these were updated and moved to the collections module in
> Py3.0.
> > >
> > >
> >
> > Noted.
> >
> >
> > >
> > >
> > > > anydbm             dbm.tools [1]_
> > > > whichdb            dbm.tools [1]_
> > > >
> > >
> > >  Were there any better naming suggestions than dbm.tools?  The original
> > > names seem much more informative.
> > >
> > >
> >
> > But way too much overhead for two modules that only contained one
> > useful function each. As Nick said, if you don't know DB stuff then I
> > don't see any loss of information.
> >
> > If you can come up with a better name I am open to suggestions, but
> > the module merge will happen.
> >
>
>  Is there a problem having the functions be just dbm.open() and
> dmb.whichdb()? As a user the latter one seems espeically logical, as it is a
> tool to help me select which "submodule" I want to use.

There is a general dislike in putting code in a package's __init__
module. Personally I am fine with doing that, but I tried not to do
that with the reorg. If people speak up in support of this then it can
happen.

-Brett

From guido at python.org  Thu May  1 02:08:31 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Apr 2008 17:08:31 -0700
Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <bbaeab100804301702x68933a7fi46372f3205d8c98b@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1>
	<bbaeab100804291344g1ca48af9s3b5bbdacf516b8d7@mail.gmail.com>
	<fvabi2$q2m$1@ger.gmane.org>
	<bbaeab100804301702x68933a7fi46372f3205d8c98b@mail.gmail.com>
Message-ID: <ca471dc20804301708j15832a72g19540b6830ff3be7@mail.gmail.com>

On Wed, Apr 30, 2008 at 5:02 PM, Brett Cannon <brett at python.org> wrote:
>  There is a general dislike in putting code in a package's __init__
>  module. Personally I am fine with doing that, but I tried not to do
>  that with the reorg. If people speak up in support of this then it can
>  happen.

I'm not sure I agree with that sentiment. Quite a few packages have
large __index__.py files. Django routinely puts lots of code there
too.

Even if people prefer not to put (too much) code in __init__.py, a
good compromise might be to put actual implementation code in a
separate submodule, and to put things like

from submodule import *  # submodule.py better define __all__...

or

from submodule import api1, api2, ...

in __init__.py.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From musiccomposition at gmail.com  Thu May  1 02:10:44 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Wed, 30 Apr 2008 19:10:44 -0500
Subject: [Python-3000] range() issues
In-Reply-To: <ca471dc20804301641h7eaa220fk670baa3c916adff1@mail.gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
	<ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>
	<4818FD40.5010701@canterbury.ac.nz>
	<ca471dc20804301641h7eaa220fk670baa3c916adff1@mail.gmail.com>
Message-ID: <1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com>

On Wed, Apr 30, 2008 at 6:41 PM, Guido van Rossum <guido at python.org> wrote:
>  I've come across situations where len() raising an exception was more
>  inconvenient than returning a truncated value (e.g. when printing).

In those cases, shouldn't you be explicit, catch the overflow
exception, and then use sys.maxsize?

>  But I didn't find a rationale. I'm sure it was PBP though.

What's PBP? (A search only turns up a bicycle race. :))


-- 
Cheers,
Benjamin Peterson

From guido at python.org  Thu May  1 02:16:26 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Apr 2008 17:16:26 -0700
Subject: [Python-3000] range() issues
In-Reply-To: <1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
	<ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>
	<4818FD40.5010701@canterbury.ac.nz>
	<ca471dc20804301641h7eaa220fk670baa3c916adff1@mail.gmail.com>
	<1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com>
Message-ID: <ca471dc20804301716o50b47946tebf082004ae0e08a@mail.gmail.com>

On Wed, Apr 30, 2008 at 5:10 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Wed, Apr 30, 2008 at 6:41 PM, Guido van Rossum <guido at python.org> wrote:
>  >  I've come across situations where len() raising an exception was more
>  >  inconvenient than returning a truncated value (e.g. when printing).
>
>  In those cases, shouldn't you be explicit, catch the overflow
>  exception, and then use sys.maxsize?

That's what I did *after* a big run crashed. :-(

>  >  But I didn't find a rationale. I'm sure it was PBP though.
>
>  What's PBP? (A search only turns up a bicycle race. :))

Practicality Beats Purity, from the zen of Python.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From musiccomposition at gmail.com  Thu May  1 02:26:35 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Wed, 30 Apr 2008 19:26:35 -0500
Subject: [Python-3000] range() issues
In-Reply-To: <ca471dc20804301716o50b47946tebf082004ae0e08a@mail.gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
	<ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>
	<4818FD40.5010701@canterbury.ac.nz>
	<ca471dc20804301641h7eaa220fk670baa3c916adff1@mail.gmail.com>
	<1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com>
	<ca471dc20804301716o50b47946tebf082004ae0e08a@mail.gmail.com>
Message-ID: <1afaf6160804301726m26426081mbf70fc1e2812cc07@mail.gmail.com>

On Wed, Apr 30, 2008 at 7:16 PM, Guido van Rossum <guido at python.org> wrote:
>  >  >  But I didn't find a rationale. I'm sure it was PBP though.
>  >
>  >  What's PBP? (A search only turns up a bicycle race. :))
>
>  Practicality Beats Purity, from the zen of Python

It's practical to have a builtin function silently "lie" about the
length of a sequence? I don't see how that makes anybody's life much
easier.

>
>
>
>  --
>  --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
Cheers,
Benjamin Peterson

From brett at python.org  Thu May  1 02:34:19 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 1 May 2008 02:34:19 +0200
Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <ca471dc20804301708j15832a72g19540b6830ff3be7@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1>
	<bbaeab100804291344g1ca48af9s3b5bbdacf516b8d7@mail.gmail.com>
	<fvabi2$q2m$1@ger.gmane.org>
	<bbaeab100804301702x68933a7fi46372f3205d8c98b@mail.gmail.com>
	<ca471dc20804301708j15832a72g19540b6830ff3be7@mail.gmail.com>
Message-ID: <bbaeab100804301734y58431a04v7ff4e7b0cd21834f@mail.gmail.com>

On Wed, Apr 30, 2008 at 5:08 PM, Guido van Rossum <guido at python.org> wrote:
> On Wed, Apr 30, 2008 at 5:02 PM, Brett Cannon <brett at python.org> wrote:
>  >  There is a general dislike in putting code in a package's __init__
>  >  module. Personally I am fine with doing that, but I tried not to do
>  >  that with the reorg. If people speak up in support of this then it can
>  >  happen.
>
>  I'm not sure I agree with that sentiment. Quite a few packages have
>  large __index__.py files. Django routinely puts lots of code there
>  too.
>
>  Even if people prefer not to put (too much) code in __init__.py, a
>  good compromise might be to put actual implementation code in a
>  separate submodule, and to put things like
>
>  from submodule import *  # submodule.py better define __all__...
>
>  or
>
>  from submodule import api1, api2, ...
>
>  in __init__.py.

Going through the PEP the dbm suggestion seems to be the only one that
jumps out at me at possibly benefiting at moving something to the
__init__.py module. I personally don't like putting stuff in another
module and then importing as that provides two different module names
to get at the same time. I prefer there being just a single way to get
at the code.

Anyway, assuming there is no great outcry then I will take Joe's
suggestion as I like that organization more than the current one.

-Brett

From guido at python.org  Thu May  1 02:34:34 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 30 Apr 2008 17:34:34 -0700
Subject: [Python-3000] range() issues
In-Reply-To: <1afaf6160804301726m26426081mbf70fc1e2812cc07@mail.gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz>
	<ca471dc20804301600s58058cd7q9e9fd5290ffef98f@mail.gmail.com>
	<4818FD40.5010701@canterbury.ac.nz>
	<ca471dc20804301641h7eaa220fk670baa3c916adff1@mail.gmail.com>
	<1afaf6160804301710w481daa0cmc0437a212594f1dd@mail.gmail.com>
	<ca471dc20804301716o50b47946tebf082004ae0e08a@mail.gmail.com>
	<1afaf6160804301726m26426081mbf70fc1e2812cc07@mail.gmail.com>
Message-ID: <ca471dc20804301734q33590e08q4813a6d1ff468585@mail.gmail.com>

As I said before, apparently it is practical in the Java world.

On Wed, Apr 30, 2008 at 5:26 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Wed, Apr 30, 2008 at 7:16 PM, Guido van Rossum <guido at python.org> wrote:
>  >  >  >  But I didn't find a rationale. I'm sure it was PBP though.
>  >  >
>  >  >  What's PBP? (A search only turns up a bicycle race. :))
>  >
>  >  Practicality Beats Purity, from the zen of Python
>
>  It's practical to have a builtin function silently "lie" about the
>  length of a sequence? I don't see how that makes anybody's life much
>  easier.
>
>  >
>  >
>  >
>  >  --
>
> >  --Guido van Rossum (home page: http://www.python.org/~guido/)
>  >
>
>
>
>  --
>  Cheers,
>  Benjamin Peterson
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rasky at develer.com  Thu May  1 03:04:35 2008
From: rasky at develer.com (Giovanni Bajo)
Date: Thu, 1 May 2008 01:04:35 +0000 (UTC)
Subject: [Python-3000] Removal of os.path.walk
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
Message-ID: <fvb4v3$tfn$1@ger.gmane.org>

On Wed, 30 Apr 2008 16:02:31 -0700, Guido van Rossum wrote:

> There is one use case I can see for an iterator-version of os.listdir()
> (to be named os.opendir()): when globbing a huge directory looking for a
> certain pattern. Using os.listdir() you end up needed enough memory to
> hold all of the names at once. Using os.opendir() you would need only
> enough memory to hold all of the names THAT MATCH.

Not only that, but you can also start processing files one by one without 
having to wait for the whole list to be constructed (which might take 
time over a network file system); in fact, the user might even want to 
abort the operation after a few files were processed, in which case the 
whole directory is not accessed.
-- 
Giovanni Bajo
Develer S.r.l.
http://www.develer.com


From ishimoto at gembook.org  Thu May  1 05:06:16 2008
From: ishimoto at gembook.org (atsuo ishimoto)
Date: Thu, 1 May 2008 12:06:16 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com>
	<4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru>
	<4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru>
	<480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<4807C3C1.6010602@v.loewis.de>
	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
Message-ID: <797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com>

On Thu, May 1, 2008 at 2:36 AM, Guido van Rossum <guido at python.org> wrote:
> I still like this proposal. I don't quite understand the competing (?)
>  proposal by Stephen Turnbull; perhaps Stephen can compare and contrast
>  the two proposals?

I think Stephen's proposal is not competing to Martin's proposal, but
add some characters to be hex-escaped as ambiguous.

> And where does Atsuo fall?

Sorry, I cannot understand word 'fall', perhaps a colloquial expression?
If you mean 'Hey, Atsuo. Hurry up!', then I have just uploaded draft
PEP to Python Wiki.

http://wiki.python.org/moin/Python3kStringRepr

Feedback and suggestions are much appreciated.

From stephen at xemacs.org  Thu May  1 06:06:34 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 01 May 2008 13:06:34 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com>
	<4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru>
	<4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru>
	<480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<4807C3C1.6010602@v.loewis.de>
	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
	<797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com>
Message-ID: <87mynazn05.fsf@uwakimon.sk.tsukuba.ac.jp>

atsuo ishimoto writes:

 > > And where does Atsuo fall?
 > 
 > Sorry, I cannot understand word 'fall', perhaps a colloquial expression?

In this case, it means "what is your opinion, compared to Stephen and
Martin?"

 > If you mean 'Hey, Atsuo. Hurry up!', then I have just uploaded draft
 > PEP to Python Wiki.

Great!  I'll take a look tomorrow or Friday.

From martin at v.loewis.de  Thu May  1 07:31:33 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 07:31:33 +0200
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>	
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>	
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
Message-ID: <481955B5.2030805@v.loewis.de>

Guido van Rossum wrote:
> There is one use case I can see for an iterator-version of
> os.listdir() (to be named os.opendir()): when globbing a huge
> directory looking for a certain pattern. Using os.listdir() you end up
> needed enough memory to hold all of the names at once. Using
> os.opendir() you would need only enough memory to hold all of the
> names THAT MATCH.

You would still have to read the entire directory, right?
In that kind of class, there is a number of applications;
e.g. du(1) also wouldn't have to create a list of all files
in the directory, but add the sizes of the files incrementally.

So the question really is whether it is a problem to keep
all file names in memory simultaneously. As Aahz says, the
total memory consumption for a large directory is still
comparatively low, for today's machines.

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Thu May  1 07:33:10 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 01 May 2008 17:33:10 +1200
Subject: [Python-3000] [stdlib-sig] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <bbaeab100804301702x68933a7fi46372f3205d8c98b@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<001401c8a9dd$e82e63c0$c600a8c0@RaymondLaptop1>
	<bbaeab100804291344g1ca48af9s3b5bbdacf516b8d7@mail.gmail.com>
	<fvabi2$q2m$1@ger.gmane.org>
	<bbaeab100804301702x68933a7fi46372f3205d8c98b@mail.gmail.com>
Message-ID: <48195616.9000403@canterbury.ac.nz>

Brett Cannon wrote:
> There is a general dislike in putting code in a package's __init__
> module.

Why? What's the point of having an __init__.py file if
you're not allowed to put any code there?

If it's something that applies to the package as a
whole, that seems like the obvious place to put it.

-- 
Greg

From martin at v.loewis.de  Thu May  1 09:08:48 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 01 May 2008 09:08:48 +0200
Subject: [Python-3000] range() issues
In-Reply-To: <5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>	<e04bdf310804260450y120dc2b5j26abb5ea0be4096a@mail.gmail.com>	<loom.20080426T131056-294@post.gmane.org>	<e04bdf310804261149n1f9c914fo151c0415fc872aab@mail.gmail.com>	<d38f5330804270407w5d915e2cy3812621769cbb45e@mail.gmail.com>	<ca471dc20804281618r79fb8d8bn48e474bfede286fe@mail.gmail.com>	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>
	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
Message-ID: <48196C80.6020608@v.loewis.de>

> These numbers aren't ridiculously large.  I just tried
> 
> for i in range(2**31): pass
> 
> on my (32-bit) laptop: it took 736.8 seconds, or about 12 and a bit minutes.
> (An aside: in contrast,
> 
> for i in range(2**31-1): pass
> 
> took only 131.1 seconds;  looks like there's some potential for optimization
> here....)

No, it means the optimization has already been implemented:

py> iter(range(2**31-1))
<range_iterator object at 0xb7a9b9f8>
py> iter(range(2**31))
<longrange_iterator object at 0xb7a9b968>

IOW, you can iterate over very long ranges, but doing so will be much
slower (per element) than iterating over a short range.

Regards,
Martin

From ncoghlan at gmail.com  Thu May  1 12:20:04 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 01 May 2008 20:20:04 +1000
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <481955B5.2030805@v.loewis.de>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>		<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>		<20080430144804.GA26439@panix.com>
	<4818F7C3.7060806@v.loewis.de>	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
	<481955B5.2030805@v.loewis.de>
Message-ID: <48199954.4000800@gmail.com>

Martin v. L?wis wrote:
> Guido van Rossum wrote:
>> There is one use case I can see for an iterator-version of
>> os.listdir() (to be named os.opendir()): when globbing a huge
>> directory looking for a certain pattern. Using os.listdir() you end up
>> needed enough memory to hold all of the names at once. Using
>> os.opendir() you would need only enough memory to hold all of the
>> names THAT MATCH.
> 
> You would still have to read the entire directory, right?
> In that kind of class, there is a number of applications;
> e.g. du(1) also wouldn't have to create a list of all files
> in the directory, but add the sizes of the files incrementally.
> 
> So the question really is whether it is a problem to keep
> all file names in memory simultaneously. As Aahz says, the
> total memory consumption for a large directory is still
> comparatively low, for today's machines.

I think Giovanni's point is an important one as well - with an iterator, 
you can pipeline your operations far more efficiently, since you don't 
have to wait for the whole directory listing before doing anything (e.g. 
if you're doing some kind of move/rename operation on a directory, you 
can start copying the first file to its new location without having to 
wait for the directory read to finish).

Reducing the startup delays of an operation can be a very useful thing 
when it comes to providing a user with a good feeling of responsiveness 
from an application (and if it allows the application to more 
effectively pipeline something, there may be an actual genuine 
improvement in responsiveness, rather than just the appearance of one).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From richard at tartarus.org  Thu May  1 13:36:37 2008
From: richard at tartarus.org (Richard Boulton)
Date: Thu, 01 May 2008 12:36:37 +0100
Subject: [Python-3000] range() issues
In-Reply-To: <48196C80.6020608@v.loewis.de>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>	<e04bdf310804260450y120dc2b5j26abb5ea0be4096a@mail.gmail.com>	<loom.20080426T131056-294@post.gmane.org>	<e04bdf310804261149n1f9c914fo151c0415fc872aab@mail.gmail.com>	<d38f5330804270407w5d915e2cy3812621769cbb45e@mail.gmail.com>	<ca471dc20804281618r79fb8d8bn48e474bfede286fe@mail.gmail.com>	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<48196C80.6020608@v.loewis.de>
Message-ID: <4819AB45.3060606@tartarus.org>

Martin v. L?wis wrote:
>> These numbers aren't ridiculously large.  I just tried
>>
>> for i in range(2**31): pass
>>
>> on my (32-bit) laptop: it took 736.8 seconds, or about 12 and a bit minutes.
>> (An aside: in contrast,
>>
>> for i in range(2**31-1): pass
>>
>> took only 131.1 seconds;  looks like there's some potential for optimization
>> here....)

There's always potential for optimization ... just a question of whether
it's worth the increased coding (and maintenance) effort.

> No, it means the optimization has already been implemented:
> 
> py> iter(range(2**31-1))
> <range_iterator object at 0xb7a9b9f8>
> py> iter(range(2**31))
> <longrange_iterator object at 0xb7a9b968>
> 
> IOW, you can iterate over very long ranges, but doing so will be much
> slower (per element) than iterating over a short range.

In the slow example given, only one of the returned items needs to be a
long, so a possible further optimisation which would work well for this
case would be to automatically split the range into two parts - the part
which only needs short integers, and the part which needs longs, and
have a "mixedrange_iterator" type which returned all the items from one
of these, followed by all the items from the other.  In the general
case, there might need to be three such sub-iterators used:
range(-2**32, 2**32), for example, would be decomposed into
range(-2**32, -2**31-1) + range(-2**31, 2**31-1) + range(2**31, 2**32)

Not saying it's worth doing this optimisation, particularly, but I'm
going to guess that these are the lines the grandparent poster was
thinking along.

-- 
Richard

From martin at v.loewis.de  Thu May  1 15:52:23 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 15:52:23 +0200
Subject: [Python-3000] range() issues
In-Reply-To: <4819A506.6090807@lemurconsulting.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>	<e04bdf310804260450y120dc2b5j26abb5ea0be4096a@mail.gmail.com>	<loom.20080426T131056-294@post.gmane.org>	<e04bdf310804261149n1f9c914fo151c0415fc872aab@mail.gmail.com>	<d38f5330804270407w5d915e2cy3812621769cbb45e@mail.gmail.com>	<ca471dc20804281618r79fb8d8bn48e474bfede286fe@mail.gmail.com>	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<48196C80.6020608@v.loewis.de>
	<4819A506.6090807@lemurconsulting.com>
Message-ID: <4819CB17.2050109@v.loewis.de>

> In the slow example given, only one of the returned items needs to be a
> long

This is Py3k. They are all longs.

Regards,
Martin

From aahz at pythoncraft.com  Thu May  1 16:25:24 2008
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 1 May 2008 07:25:24 -0700
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <481955B5.2030805@v.loewis.de>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
	<481955B5.2030805@v.loewis.de>
Message-ID: <20080501142524.GA3546@panix.com>

On Thu, May 01, 2008, "Martin v. L?wis" wrote:
> Guido van Rossum wrote:
>>
>> There is one use case I can see for an iterator-version of
>> os.listdir() (to be named os.opendir()): when globbing a huge
>> directory looking for a certain pattern. Using os.listdir() you end up
>> needed enough memory to hold all of the names at once. Using
>> os.opendir() you would need only enough memory to hold all of the
>> names THAT MATCH.
> 
> You would still have to read the entire directory, right?  In that
> kind of class, there is a number of applications; e.g. du(1) also
> wouldn't have to create a list of all files in the directory, but add
> the sizes of the files incrementally.

Actually, the primary application I'm thinking of is a CGI that displays
part of a directory listing (paged) for manual processing of individual
files.

> So the question really is whether it is a problem to keep all file
> names in memory simultaneously. As Aahz says, the total memory
> consumption for a large directory is still comparatively low, for
> today's machines.

Only for a single process.  Throw together three or ten processes, and
it adds up.  As I said, not a huge problem, but defintely the potential
for pain.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Help a hearing-impaired person: http://rule6.info/hearing.html

From ncoghlan at gmail.com  Thu May  1 16:41:57 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 02 May 2008 00:41:57 +1000
Subject: [Python-3000] range() issues
In-Reply-To: <4819CB17.2050109@v.loewis.de>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>	<e04bdf310804260450y120dc2b5j26abb5ea0be4096a@mail.gmail.com>	<loom.20080426T131056-294@post.gmane.org>	<e04bdf310804261149n1f9c914fo151c0415fc872aab@mail.gmail.com>	<d38f5330804270407w5d915e2cy3812621769cbb45e@mail.gmail.com>	<ca471dc20804281618r79fb8d8bn48e474bfede286fe@mail.gmail.com>	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>	<48196C80.6020608@v.loewis.de>	<4819A506.6090807@lemurconsulting.com>
	<4819CB17.2050109@v.loewis.de>
Message-ID: <4819D6B5.3060905@gmail.com>

Martin v. L?wis wrote:
>> In the slow example given, only one of the returned items needs to be a
>> long
> 
> This is Py3k. They are all longs.

Not inside the object they aren't - I believe the optimised one uses C 
longs internally, and converts to a Python long when it returns the 
values, whereas 'longrange' uses Python long objects internally as well. 
Oddly enough, this is going to make the increment/decrement operations 
for the counter quite a bit slower :)

One way to optimise this (since all we need to support here is counting 
rather than arbitrary arithmetic) would be for the longrange iterator to 
use some simple pure C fixed point arithmetic internally to keep track 
of an arbitrarily long counter, and only convert to a Python long when 
it has to (just like the optimised shortrange iterator).

I'm not sure it is worth the hassle though.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From dickinsm at gmail.com  Thu May  1 16:59:38 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Thu, 1 May 2008 10:59:38 -0400
Subject: [Python-3000] range() issues
In-Reply-To: <4819D6B5.3060905@gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<ca471dc20804281618r79fb8d8bn48e474bfede286fe@mail.gmail.com>
	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>
	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>
	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>
	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<48196C80.6020608@v.loewis.de> <4819A506.6090807@lemurconsulting.com>
	<4819CB17.2050109@v.loewis.de> <4819D6B5.3060905@gmail.com>
Message-ID: <5c6f2a5d0805010759w8610ff0oc6e3c4e7aa2c9fc5@mail.gmail.com>

On Thu, May 1, 2008 at 10:41 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:

> One way to optimise this (since all we need to support here is counting
> rather than arbitrary arithmetic) would be for the longrange iterator to use
> some simple pure C fixed point arithmetic internally to keep track of an
> arbitrarily long counter, and only convert to a Python long when it has to
> (just like the optimised shortrange iterator).
>

Stop already!  It was an ill-considered, throwaway comment, and I apologise
for making it.


> I'm not sure it is worth the hassle though.
>

Indeed.   Using such a large range is almost certainly not common enough to
make it worth optimising...

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080501/fded6801/attachment.htm>

From collinw at gmail.com  Thu May  1 16:41:35 2008
From: collinw at gmail.com (Collin Winter)
Date: Thu, 1 May 2008 07:41:35 -0700
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
Message-ID: <43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com>

On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon <brett at python.org> wrote:
> [bcc to stdlib-sig]
>
>  After two false starts over the YEARS of trying to cleanup and
>  reorganize the stdlib, creating a SIG to get this going, having Guido
>  give the PEP the once-over over the past several days, and creating
>  two new bugs reports (issues 2715 and 2716), PEP 3108 is finally ready
>  for public vetting!
>
>  While reading this PEP, do remember this is only about either removing
>  modules, renaming them, or moving them into a package. Additions are
>  not covered by this PEP!
>
>  Also realize all of the right people have been consulted on this stuff
>  (e.g., the web SIG about the urllib package). So please do not think
>  that something that seems drastic (e.g., the removal of all
>  Mac-specific modules) was taken lightly when in fact the proper people
>  were asked and they were okay with what is going on.
>
>  Lastly, I do not want this to turn into a drawn-out thread about how
>  people think some module should stay because they happen to use it or
>  suggest some other module to remove. Please think before you propose a
>  change. I have been through this proposal process for this reorg
>  before and every time it has gotten way out of control. I do not want
>  it happen this time.
>
>  OK, with all of that out of the way, here is the PEP:
>  -----------------------------------------------
>
>  PEP: 3108
>  Title: Standard Library Reorganization
>  Version: $Revision: 62573 $
>  Last-Modified: $Date: 2008-04-28 17:56:36 -0700 (Mon, 28 Apr 2008) $
>  Author: Brett Cannon <brett at python.org>
>  Status: Draft
>  Type: Standards Track
>  Content-Type: text/x-rst
>  Created: 01-Jan-2007
>  Python-Version: 3.0
>  Post-History:
>
>
>  Abstract
>  ========
>
>  Just like the language itself, Python's standard library (stdlib) has
>  grown over the years to be very rich.  But over time some modules
>  have lost their need to be included with Python.  There has also been
>  an introduction of a naming convention for modules since Python's
>  inception that not all modules follow.
>
>  Python 3.0 has presented a chance to remove modules that do not have
>  long term usefulness.  This chance also allows for the renaming of
>  modules so that they follow the Python style guide [#pep-0008]_.  This
>  PEP lists modules that should not be included in Python 3.0 and what
>  modules need to be renamed.
>
>
>  Modules to Remove
>  =================
>
>  Guido pronounced that "silly old stuff" is to be deleted from the
>  stdlib for Py3K [#silly-old-stuff]_.  This is open-ended on purpose.
>  Each module to be removed needs to have a justification as to why it
>  should no longer be distributed with Python.  This can range from the
>  module being deprecated in Python 2.x to being for a platform that is
>  no longer widely used.
>
>  This section of the PEP lists the various modules to be removed. Each
>  subsection represents a different reason for modules to be
>  removed.  Each module must have a specific justification on top of
>  being listed in a specific subsection so as to make sure only modules
>  that truly deserve to be removed are in fact removed.
>
>  When a reason mentions how long it has been since a module has been
>  "uniquely edited", it is in reference to how long it has been since a
>  checkin was done specifically for the module and not for a change that
>  applied universally across the entire stdlib.  If an edit time is not
>  denoted as "unique" then it is the last time the file was edited,
>  period.
>
>  The procedure to thoroughly remove a module is:
>
>  #. Remove the module.
>  #. Remove the tests.
>  #. Edit ``Modules/Setup.dist`` and ``setup.py`` if needed.
>  #. Remove the docs (if applicable).
>  #. Run the regression test suite (using ``-uall``); watch out for
>    tests that are skipped because an import failed for the removed
>    module.
>
>  If a deprecation warning is added to 2.6, it would be better to make
>  all the changes to 2.6, merge the changes into the 3k branch, then
>  perform the procedure above.  This will avoid some merge conflicts.
>
>
>  Previously deprecated
>  ---------------------
>
>  PEP 4 lists all modules that have been deprecated in the stdlib
>  [#pep-0004]_.  The specified motivations mirror those listed in
>  PEP 4. All modules listed
>  in the PEP at the time of the first alpha release of Python 3.0 will
>  be removed.
>
>  The entire contents of lib-old will also be removed.  These modules
>  have already been removed from being imported but are kept in the
>  distribution for Python for users that rely upon the code.
>
>  * buildtools
>
>     + Documented as deprecated since Python 2.3 without an explicit
>       reason.
>
>  * cfmfile
>
>     + Documented as deprecated since Python 2.4 without an explicit
>       reason.
>
>  * cl
>
>     + Documented as obsolete since Python 2.0 or earlier.
>     + Interface to SGI hardware.
>
>  * md5
>
>     + Supplanted by the ``hashlib`` module.
>
>  * mimetools
>
>     + Documented as obsolete without an explicit reason.
>
>  * MimeWriter
>
>     + Supplaned by the ``email`` package.
>
>  * mimify
>
>     + Supplanted by the ``email`` package.
>
>  * multifile
>
>     + Supplanted by the ``email`` package.
>
>  * posixfile
>
>     + Locking is better done by ``fcntl.lockf()``.
>
>  * rfc822
>
>     + Supplanted by the ``email`` package.
>
>  * sha
>
>     + Supplanted by the ``hashlib`` package.
>
>  * sv
>
>     + Documented as obsolete since Python 2.0 or earlier.
>     + Interface to obsolete SGI Indigo hardware.
>
>  * timing
>
>     + Documented as obsolete since Python 2.0 or earlier.
>     + ``time.clock()`` gives better time resolution.
>
>
>  Platform-specific with minimal use
>  ----------------------------------
>
>  Python supports many platforms, some of which are not widely held.
>  And on some of these platforms there are modules that have limited use
>  to people on those platforms.  Because of their limited usefulness it
>  would be better to no longer burden the Python development team with
>  their maintenance.
>
>  The module mentioned below are documented. All undocumented modules
>  for the specified platforms will also be removed.
>
>  IRIX
>  /////
>  The IRIX operating system is no longer produced [#irix-retirement]_.
>  Removing all modules from the plat-irix[56] directory has been deemed
>  reasonable because of this fact.
>
>   + AL/al [done: 3.0]
>
>     - Provides sound support on Indy and Indigo workstations.
>     - Both workstations are no longer available.
>     - Code has not been uniquely edited in three years.
>
>   + cd [done: 3.0]
>
>     - CD drive control for SGI systems.
>     - SGI no longer sells machines with IRIX on them.
>     - Code has not been uniquely edited in 14 years.
>
>   + cddb [done: 3.0]
>
>     - Undocumented.
>
>   + cdplayer [done: 3.0]
>
>     - Undocumented.
>
>   + cl/CL/CL_old [done: 3.0]
>
>     - Compression library for SGI systems.
>     - SGI no longer sells machines with IRIX on them.
>     - Code has not been uniquely edited in 14 years.
>
>   + DEVICE/GL/gl/cgen/cgensuport [done: 3.0]
>
>     - GL access, which is the predecessor to OpenGL.
>     - Has not been edited in at least eight years.
>     - Third-party libraries provide better support (PyOpenGL [#pyopengl]_).
>
>   + ERRNO [done: 3.0]
>
>     - Undocumented.
>
>   + FILE [done: 3.0]
>
>     - Undocumented.
>
>   + FL/fl/flp [done: 3.0]
>
>     - Wrapper for the FORMS library [#irix-forms]_
>     - FORMS has not been edited in 12 years.
>     - Library is not widely used.
>     - First eight hits on Google are for Python docs for fl.
>
>   + fm [done: 3.0]
>
>     - Wrapper to the IRIS Font Manager library.
>     - Only available on SGI machines which no longer come with IRIX.
>
>   + GET [done: 3.0]
>
>     - Undocumented.
>
>   + GLWS [done: 3.0]
>
>     - Undocumented.
>
>   + imgfile [done: 3.0]
>
>     - Wrapper for SGI libimage library for imglib image files
>       (``.rgb`` files).
>     - Python Imaging Library provdes read-only support [#pil]_.
>     - Not uniquely edited in 13 years.
>
>   + IN [done: 3.0]
>
>     - Undocumented.
>
>   + IOCTL [done: 3.0]
>
>     - Undocumented.
>
>   + jpeg [done: 3.0]
>
>     - Wrapper for JPEG (de)compressor.
>     - Code not uniquely edited in nine years.
>     - Third-party libraries provide better support
>       (Python Imaging Library [#pil]_).
>
>   + panel [done: 3.0]
>
>     - Undocumented.
>
>   + panelparser [done: 3.0]
>
>     - Undocumented.
>
>   + readcd [done: 3.0]
>
>     - Undocumented.
>
>   + SV [done: 3.0]
>
>     - Undocumented.
>
>   + torgb [done: 3.0]
>
>     - Undocumented.
>
>   + WAIT [done: 3.0]
>
>     - Undocumented.
>
>
>  Mac-specific modules
>  ////////////////////
>
>  The Mac-specific modules are mostly unmaintained (e.g., the bgen
>  tool used to auto-generate many of the modules has never been
>  updated to support UCS-4). It is also not Python's place to maintain
>  such a large amount of OS-specific modules. Thus all modules under
>  plat-mac are to be removed.
>
>  A stub module for proxy access will be provided for use by urllib.
>
>  * _builtinSuites
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>  * Audio_mac
>
>     - Undocumented.
>
>  * aepack
>
>     - OSA support is better through third-party modules.
>
>         * Appscript [#appscript]_.
>
>     - Hard-coded endianness which breaks on Intel Macs.
>     - Might need to rename if Carbon package dependent.
>
>  * aetools
>
>     - See aepack.
>
>  * aetypes
>
>     - See aepack.
>
>  * applesingle
>
>     - Undocumented.
>     - AppleSingle is a binary file format for A/UX.
>     - A/UX no longer distributed.
>
>  * appletrawmain
>
>     - Undocumented.
>
>  * appletrunner
>
>     - Undocumented.
>
>  * argvemulator
>
>     - Undocumented.
>
>  * autoGIL
>
>     - Very bad model for using Python with the CFRunLoop.
>
>  * bgenlocations
>
>     - Undocumented.
>
>  * bundlebuilder
>
>     - Undocumented.
>
>  * Carbon
>
>     - Carbon development has stopped.
>     - Does not support 64-bit systems completely.
>     - Dependent on bgen which has never been updated to support UCS-4
>       Unicode builds of Python.
>
>  * CodeWarrior
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>  * ColorPicker
>
>     - Better to use Cocoa for GUIs.
>
>  * EasyDialogs
>
>     - Better to use Cocoa for GUIs.
>
>  * Explorer
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>  * Finder
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>
>  * findertools
>
>     - No longer useful.
>
>  * FrameWork
>
>     - Poorly documented.
>     - Not updated to support Carbon Events.
>
>  * gensuitemodule
>
>     - See aepack.
>
>  * ic
>
>  * icopen
>
>     - Not needed on OS X.
>     - Meant to replace 'open' which is usually a bad thing to do.
>
>  * macerrors
>
>     - Undocumented.
>
>  * MacOS
>
>     - Would also mean the removal of binhex.
>
>  * macostools
>
>  * macresource
>
>     - Undocumented.
>
>  * MiniAEFrame
>
>     - See aepack.
>
>  * Nav
>
>     - Undocumented.
>
>  * Netscape
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>
>  * pimp
>
>     - Undocumented.
>
>  * PixMapWrapper
>
>     - Undocumented.
>
>  * StdSuites
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>  * SystemEvents
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>  * Terminal
>
>     - Undocumented.
>     - Package under lib-scriptpackages.
>
>
>  * terminalcommand
>
>     - Undocumented.
>
>  * videoreader
>
>      - No longer used.
>
>  * W
>
>      - No longer distributed with Python.
>
>
>  .. _PyObjC: http://pyobjc.sourceforge.net/
>
>
>  Solaris
>  ///////
>
>   + SUNAUDIODEV/sunaudiodev [done: 3.0]
>
>     - Access to the sound card on Sun machines.
>     - Code not uniquely edited in over eight years.
>
>
>  Hardly used
>  ------------
>
>  Some modules that are platform-independent are hardly used.  This
>  can be from how easy it is to implement the functionality from scratch
>  or because the audience for the code is very small.
>
>  * audiodev [done: 3.0]
>
>   + Undocumented.
>   + Not edited in five years.
>   + If removed sunaudio should go as well (also undocumented; not
>     edited in over seven years).
>
>  * imputil
>
>   + Undocumented.
>   + Never updated to support absolute imports.
>
>  * mutex
>
>   + Easy to implement using a semaphore and a queue.
>   + Cannot block on a lock attempt.
>   + Not uniquely edited since its addition 15 years ago.
>   + Only useful with the 'sched' module.
>   + Not thread-safe.
>
>
>  * stringold [done: 3.0]
>
>   + Function versions of the methods on string objects.
>   + Obsolete since Python 1.6.
>   + Any functionality not in the string object or module will be moved
>     to the string module (mostly constants).
>
>  * symtable/_symtable
>
>   + Undocumented.
>
>  * toaiff [done: 3.0, moved to Demo]
>
>   + Undocumented.
>   + Requires ``sox`` library to be installed on the system.
>
>  * user
>
>   + Easily handled by allowing the application specify its own
>     module name, check for existence, and import if found.
>
>  * new [done: 3.0]
>
>   + Just a rebinding of names from the 'types' module.
>   + Can also call ``type`` built-in to get most types easily.
>   + Docstring states the module is no longer useful as of revision
>     27241 (2002-06-15).
>
>  * pure [done: 3.0]
>
>   + Written before Pure Atria was bought by Rational which was then
>     bought by IBM (in other words, very old).
>
>  * test.testall [done: 3.0]
>
>   + From the days before regrtest.
>
>
>  Obsolete
>  --------
>
>  Becoming obsolete signifies that either another module in the stdlib
>  or a widely distributed third-party library provides a better solution
>  for what the module is meant for.
>
>  * Bastion/rexec [done: 3.0]
>
>   + Restricted execution / security.
>   + Turned off in Python 2.3.
>   + Modules deemed unsafe.
>
>  * bsddb185 [done: 3.0]
>
>   + Superceded by bsddb3
>   + Not built by default.
>   + Documentation specifies that the "module should never be used
>     directly in new code".
>
>  * commands
>
>   + subprocess module replaces it [#pep-0324]_.
>   + Remove getstatus(), move rest to subprocess.
>
>  * compiler (need to add AST -> bytecode mechanism) [done: 3.0]
>
>   + Having to maintain both the built-in compiler and the stdlib
>     package is redundant [#ast-removal]_.
>   + The AST created by the compiler is available [#ast]_.
>   + Mechanism to compile from an AST needs to be added.
>
>  * dircache
>
>   + Negligible use.
>   + Easily replicated.
>
>  * dl [done: 3.0]
>
>   + ctypes provides better support for same functionality.
>
>  * fpformat
>
>   + All functionality is supported by string interpolation.
>
>  * htmllib
>
>   + Superceded by HTMLParser.
>
>  * ihooks
>
>   + Undocumented.
>   + For use with rexec which has been turned off since Python 2.3.
>
>  * imageop [done: 3.0]
>
>   + Better support by third-party libraries
>     (Python Imaging Library [#pil]_).
>   + Unit tests relied on rgbimg and imgfile.
>         - rgbimg was removed in Python 2.6.
>         - imgfile slated for removal in this PEP. [done: 3.0]
>
>  * linuxaudiodev [done: 3.0]
>
>   + Replaced by ossaudiodev.
>
>  * mhlib
>
>   + Obsolete mailbox format.
>
>  * popen2 [done: 3.0]
>
>   + subprocess module replaces them [#pep-0324]_.
>
>  * sched
>
>   + Replaced by threading.Timer.
>
>
>  * sgmllib
>
>   + Does not fully parse SGML.
>   + In the stdlib for support to htmllib which is slated for removal.
>
>  * stat
>
>   + ``os.stat`` now returns a tuple with attributes.
>   + Functions in the module should be made into methods for the object
>     returned by os.stat.
>
>  * statvfs
>
>   + ``os.statvfs`` now returns a tuple with attributes.
>
>  * thread
>
>   + People should use 'threading' instead.
>
>     - Rename 'thread' to _thread.
>     - Deprecate dummy_thread and rename _dummy_thread.
>     - Move thread.get_ident over to threading.
>
>   + Guido has previously supported the deprecation
>     [#thread-deprecation]_.
>
>  * urllib
>
>   + Superceded by urllib2.
>   + Functionality unique to urllib will be kept in the
>     `urllib package`_.
>
>  * UserDict [done: 3.0]
>
>   + Not as useful since types can be a superclass.
>   + Useful bits moved to the 'collections' module.
>
>  * UserList/UserString [done: 3.0]
>
>   + Not useful since types can be a superclass.
>
>
>  Modules to Rename
>  =================
>
>  Along with the stdlib gaining some modules that are no longer
>  relevant, there is also the issue of naming.  Many modules existed in
>  the stdlib before PEP 8 came into existence [#pep-0008]_.  This has
>  led to some naming inconsistencies and namespace bloat that should be
>  addressed.
>
>
>  PEP 8 violations
>  ----------------
>
>  PEP 8 specifies that modules "should have short, all-lowercase names"
>  where "underscores can be used ... if it improves readability"
>  [#pep-0008]_.  The use of underscores is discouraged in package names.
>  The following modules violate PEP 8 and are not somehow being renamed
>  by being moved to a package.
>
>  ==================  ==================================================
>  Current Name        Replacement Name
>  ==================  ==================================================
>  _winreg             winreg (rename also because module has a public
>                     interface and thus should not have a leading
>                     underscore)
>  ConfigParser        configparser
>  copy_reg            copyreg
>  PixMapWrapper       pixmapwrapper
>  Queue               queue
>  SocketServer        socketserver
>  ==================  ==================================================
>
>
>  Merging C and Python implementations of the same interface
>  ----------------------------------------------------------
>
>  Several interfaces have both a Python and C implementation.  While it
>  is great to have a C implementation for speed with a Python
>  implementation as fallback, there is no need to expose the two
>  implementations independently in the stdlib.  For Python 3.0 all
>  interfaces with two implementations will be merged into a single
>  public interface.
>
>  The C module is to be given a leading underscore to delineate the fact
>  that it is not the reference implementation (the Python implementation
>  is).  This means that any semantic difference between the C and Python
>  versions must be dealt with before Python 3.0 or else the C
>  implementation will be removed until it can be fixed.
>
>  One interface that is not listed below is xml.etree.ElementTree.  This
>  is an externally maintained module and thus is not under the direct
>  control of the Python development team for renaming.  See `Open
>  Issues`_ for a discussion on this.
>
>  * pickle/cPickle
>
>   + Rename cPickle to _pickle.
>   + Semantic completeness of C implementation *not* verified.
>
>  * profile/cProfile
>
>   + Rename cProfile to _profile.
>   + Semantic completeness of C implementation *not* verified.
>
>  * StringIO/cStringIO [done: 3.0]
>
>   + Add the class to the 'io' module.
>
>
>  No public, documented interface
>  -------------------------------
>
>  There are several modules in the stdlib that have no defined public
>  interface.  These modules exist as support code for other modules that
>  are exposed.  Because they are not meant to be used directly they
>  should be renamed to reflect this fact.
>
>  ============  ===============================
>  Current Name  Replacement Name
>  ============  ===============================
>  markupbase    _markupbase [done: 3.0]
>  dummy_thread  _dummy_thread [#]_
>  ============  ===============================
>
>  .. [#] Assumes ``thread`` is renamed to ``_thread``.
>
>
>  Poorly chosen names
>  -------------------
>
>  A few modules have names that were poorly chosen in hindsight.  They
>  should be renamed so as to prevent their bad name from perpetuating
>  beyond the 2.x series.
>
>  =================  ===============================
>  Current Name       Replacement Name
>  =================  ===============================
>  repr               reprlib
>  test.test_support  test.support
>  =================  ===============================
>
>
>  Grouping of modules
>  -------------------
>
>  As the stdlib has grown, several areas within it have expanded to
>  include multiple modules (e.g., dbm support). Thus some new packages
>  make sense where the renaming makes a module's name easier to work
>  with.
>
>
>  dbm package
>  ///////////
>
>  =================  ===============================
>  Current Name       Replacement Name
>  =================  ===============================
>  anydbm             dbm.tools [1]_
>  dbhash             dbm.bsd
>  dbm                dbm.ndbm
>  dumbdm             dbm.dumb
>  gdbm               dbm.gnu
>  whichdb            dbm.tools [1]_
>  =================  ===============================
>
>
>  .. [1] ``dbm.tools`` can combine ``anybdbm`` and ``whichdb`` since the public
>        API for both modules has no name conflict and the two modules have
>        closely related usage.
>
>
>
>  html package
>  ////////////
>
>  ==================  ===============================
>  Current Name        Replacement Name
>  ==================  ===============================
>  HTMLParser          html.parser
>  htmlentitydefs      html.entities
>  ==================  ===============================
>
>
>  http package
>  ////////////
>
>  =================  ===============================
>  Current Name       Replacement Name
>  =================  ===============================
>  httplib            http.client
>  BaseHTTPServer     http.server [2]_
>  CGIHTTPServer      http.server [2]_
>  SimpleHTTPServer   http.server [2]_
>  Cookie             http.cookies
>  cookielib          http.cookiejar
>  =================  ===============================
>
>  .. [2] The ``http.server`` module can combine the specified modules
>        safely as they have no naming conflicts.
>
>
>  tkinter package
>  ///////////////
>
>  ==================  ===============================
>  Current Name        Replacement Name
>  ==================  ===============================
>  Canvas              tkinter.canvas
>  Dialog              tkinter.dialog
>  FileDialog          tkinter.filedialog [4]_
>  FixTk               tkinter._fix
>  ScrolledText        tkinter.scrolledtext
>  SimpleDialog        tkinter.simpledialog [5]_
>  Tix                 tkinter.tix
>  Tkconstants         tkinter.constants
>  Tkdnd               tkinter.dnd
>  Tkinter             tkinter.__init__
>  tkColorChooser      tkinter.colorchooser
>  tkCommonDialog      tkinter.commondialog
>  tkFileDialog        tkinter.filedialog [4]_
>  tkFont              tkinter.font
>  tkMessageBox        tkinter.messagebox
>  tkSimpleDialog      tkinter.simpledialog [5]_
>  turtle              tkinter.turtle
>  ==================  ===============================
>
>  .. [4] ``tkinter.filedialog`` can safely combine ``FileDialog`` and
>        ``tkFileDialog`` as there are no naming conflicts.
>
>  .. [5] ``tkinter.simpledialog`` can safely combine ``SimpleDialog``
>        and ``tkSimpleDialog`` have no naming conflicts.
>
>
>  urllib package
>  //////////////
>
>  Originally this new package was to be named ``url``, but because of
>  the common use of the name as a variable, it has been deemed better
>  to keep the name ``urllib`` and instead shift existing modules around
>  into a new package.
>
>  ==================  ===============================
>  Current Name        Replacement Name
>  ==================  ===============================
>  urllib2             urllib.request
>  urlparse            urllib.parse
>  urllib              urllib.parse, urllib.request [6]_
>  ==================  ===============================
>
>  .. [6] The quoting-related functions from ``urllib`` will be added
>        to ``urllib.parse``. ``urllib.URLOpener`` and
>        ``urllib.FancyUrlOpener`` will be added to ``urllib.request``
>        as long as the documentation for both modules is updated.
>
>
>  xmlrpc package
>  //////////////
>
>  ==================  ===============================
>  Current Name        Replacement Name
>  ==================  ===============================
>  xmlrpclib           xmlrpc.client
>  SimpleXMLRPCServer  xmlrpc.server [3]_
>  CGIXMLRPCServer     xmlrpc.server [3]_
>  ==================  ===============================
>
>  .. [3] The modules being combined into ``xmlrpc.server`` have no
>        naming conflicts and thus can safely be merged.
>
>
>  Transition Plan
>  ===============
>
>  For modules to be removed
>  -------------------------
>
>  For the removal of modules that are continuing to exist in the Python
>  2.x series (i.e., not deprecated explicitly in the 2.x series),
>  ``warnings.warn3k()`` will be used to issue a DeprecationWarning.

FYI, we can also flag these using 2to3.

>  Renaming of modules
>  -------------------
>
>  For modules that are renamed, stub modules will be created with the
>  original names and be kept in a directory within the stdlib (e.g. like
>  how lib-old was once used).  The need to keep the stub modules within
>  a directory is to prevent naming conflicts with case-insensitive
>  filesystems in those cases where nothing but the case of the module
>  is changing.
>
>  These stub modules will import the module code based on the new
>  naming.  The same type of warning being raised by modules being
>  removed will be raised in the stub modules.
>
>  Support in the 2to3 refactoring tool for renames will also be used
>  [#2to3]_.  Import statements will be rewritten so that only the import
>  statement and none of the rest of the code needs to be touched.  This
>  will be accomplished by using the ``as`` keyword in import statements
>  to bind in the module namespace to the old name while importing based
>  on the new name.

You should cite the existing fix_imports fixer as one example of how
to do this: http://svn.python.org/view/sandbox/trunk/2to3/lib2to3/fixes/fix_imports.py?view=markup

>  Open Issues
>  ===========
>
>  Renaming of modules maintained outside of the stdlib
>  ----------------------------------------------------
>
>  xml.etree.ElementTree not only does not meet PEP 8 naming standards
>  but it also has an exposed C implementation [#pep-0008]_.  It is an
>  externally maintained package, though [#pep-0360]_.  A request will be
>  made for the maintainer to change the name so that it matches PEP 8
>  and hides the C implementation.
>
>
>  Rejected Ideas
>  ==============
>
>  Modules that were originally suggested for removal
>  --------------------------------------------------
>
>  * asynchat/asyncore
>
>   + Josiah Carlson has said he will maintain the modules.
>
>  * audioop/sunau/aifc
>
>    + Audio modules where the formats are still used.
>
>  * base64/quopri/uu
>
>   + All still widely used.
>   + 'codecs' module does not provide as nice of an API for basic
>     usage.
>
>  * fileinput
>
>   + Useful when having to work with stdin.
>
>  * linecache
>
>    + Used internally in several places.
>
>  * nis
>
>   + Testimonials from people that new installations of NIS are still
>     occurring
>
>  * getopt
>
>   + Simpler than optparse.
>
>  * repr
>
>   + Useful as a basis for overriding.
>   + Used internally.
>
>  * telnetlib
>
>   + Really handy for quick-and-dirty remote access.
>   + Some hardware supports using telnet for configuration and
>     querying.
>
>  * Tkinter
>
>   + Would prevent IDLE from existing.
>   + No GUI toolkit would be available out of the box.
>
>
>  Introducing a new top-level package
>  -----------------------------------
>
>  It has been suggested that the entire stdlib be placed within its own
>  package.  This PEP will not address this issue as it has its own
>  design issues (naming, does it deserve special consideration in import
>  semantics, etc.).  Everything within this PEP can easily be handled if
>  a new top-level package is introduced.
>
>
>  Introducing new packages to contain theme-related modules
>  ---------------------------------------------------------
>
>  During the writing of this PEP it was noticed that certain themes
>  appeared in the stdlib.  In the past people have suggested introducing
>  new packages to help collect modules that share a similar theme (e.g.,
>  audio).  An Open Issue was created to suggest some new packages to
>  introduce.
>
>  In the end, though, not enough support could be pulled together to
>  warrant moving forward with the idea.  Instead name simplification has
>  been chosen as the guiding force for PEPs to create.
>
>
>  References
>  ==========
>
>  .. [#pep-0004] PEP 4: Deprecation of Standard Modules
>     (http://www.python.org/dev/peps/pep-0004/)
>
>  .. [#pep-0008] PEP 8: Style Guide for Python Code
>     (http://www.python.org/dev/peps/pep-0008/)
>
>  .. [#pep-0324] PEP 324: subprocess -- New process module
>     (http://www.python.org/dev/peps/pep-0324/)
>
>  .. [#pep-0360] PEP 360: Externally Maintained Packages
>     (http://www.python.org/dev/peps/pep-0360/)
>
>  .. [#module-index] Python Documentation: Global Module Index
>     (http://docs.python.org/modindex.html)
>
>  .. [#timing-module] Python Library Reference: Obsolete
>     (http://docs.python.org/lib/obsolete-modules.html)
>
>  .. [#silly-old-stuff] Python-Dev email: "Py3k release schedule worries"
>     (http://mail.python.org/pipermail/python-3000/2006-December/005130.html)
>
>  .. [#thread-deprecation] Python-Dev email: Autoloading?
>     (http://mail.python.org/pipermail/python-dev/2005-October/057244.html)
>
>  .. [#py-dev-summary-2004-11-01] Python-Dev Summary: 2004-11-01
>     (http://www.python.org/dev/summary/2004-11-01_2004-11-15/#id10)
>
>  .. [#2to3] 2to3 refactoring tool
>     (http://svn.python.org/view/sandbox/trunk/2to3/)
>
>  .. [#pyopengl] PyOpenGL
>     (http://pyopengl.sourceforge.net/)
>
>  .. [#pil] Python Imaging Library (PIL)
>     (http://www.pythonware.com/products/pil/)
>
>  .. [#twisted] Twisted
>     (http://twistedmatrix.com/trac/)
>
>  .. [#irix-retirement] SGI Press Release:
>     End of General Availability for MIPS IRIX Products -- December 2006
>     (http://www.sgi.com/support/mips_irix.html)
>
>  .. [#irix-forms] FORMS Library by Mark Overmars
>     (ftp://ftp.cs.ruu.nl/pub/SGI/FORMS)
>
>  .. [#sun-au] Wikipedia: Au file format
>     (http://en.wikipedia.org/wiki/Au_file_format)
>
>  .. [#appscript] appscript
>     (http://appscript.sourceforge.net/)
>
>  .. [#ast] _ast module
>     (http://docs.python.org/lib/ast.html)
>
>  .. [#ast-removal] python-dev email: getting compiler package failures
>     (http://mail.python.org/pipermail/python-3000/2007-May/007615.html)
>
>
>  Copyright
>  =========
>
>  This document has been placed in the public domain.
>
>
>
>  ..
>    Local Variables:
>    mode: indented-text
>    indent-tabs-mode: nil
>    sentence-end-double-space: t
>    fill-column: 70
>    coding: utf-8
>    End:
>  _______________________________________________
>  Python-3000 mailing list
>  Python-3000 at python.org
>  http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe: http://mail.python.org/mailman/options/python-3000/collinw%40gmail.com
>

From martin at v.loewis.de  Thu May  1 17:26:03 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 17:26:03 +0200
Subject: [Python-3000] range() issues
In-Reply-To: <4819D6B5.3060905@gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>	<e04bdf310804260450y120dc2b5j26abb5ea0be4096a@mail.gmail.com>	<loom.20080426T131056-294@post.gmane.org>	<e04bdf310804261149n1f9c914fo151c0415fc872aab@mail.gmail.com>	<d38f5330804270407w5d915e2cy3812621769cbb45e@mail.gmail.com>	<ca471dc20804281618r79fb8d8bn48e474bfede286fe@mail.gmail.com>	<d38f5330804290730oe40394dr87f9849d89d4851f@mail.gmail.com>	<5c6f2a5d0804291317k424a4a89v37f9b1691d9f76ad@mail.gmail.com>	<d38f5330804291330t34438ce8j506a044ab28ee427@mail.gmail.com>	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>	<48196C80.6020608@v.loewis.de>	<4819A506.6090807@lemurconsulting.com>
	<4819CB17.2050109@v.loewis.de> <4819D6B5.3060905@gmail.com>
Message-ID: <4819E10B.2040101@v.loewis.de>

Nick Coghlan wrote:
> Martin v. L?wis wrote:
>>> In the slow example given, only one of the returned items needs to be a
>>> long
>>
>> This is Py3k. They are all longs.
> 
> Not inside the object they aren't

Right, inside, they are longs - but the *returned items* are all longs.

> One way to optimise this (since all we need to support here is counting
> rather than arbitrary arithmetic) would be for the longrange iterator to
> use some simple pure C fixed point arithmetic internally to keep track
> of an arbitrarily long counter, and only convert to a Python long when
> it has to (just like the optimised shortrange iterator).
> 
> I'm not sure it is worth the hassle though.

What simple pure C fixed point arithmetic would you be thinking of? The
long type *is* a pure C fixed point arithmetic.

There are perhaps some simplifications possible to longrangeiter_next
possible, e.g. it doesn't need to perform a multiplication, but could
just add the step each time. Also, it could cache the value 1 in a
global variable, rather than creating a fresh one each time.

Other than that, I cannot imagine why another fixed point arithmetic
might be significantly faster.

Regards,
Martin

From guido at python.org  Thu May  1 17:57:10 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 08:57:10 -0700
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <20080501142524.GA3546@panix.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
	<481955B5.2030805@v.loewis.de> <20080501142524.GA3546@panix.com>
Message-ID: <ca471dc20805010857h6f24a52cm219c55b06cae0a3a@mail.gmail.com>

On Thu, May 1, 2008 at 7:25 AM, Aahz <aahz at pythoncraft.com> wrote:
>  Actually, the primary application I'm thinking of is a CGI that displays
>  part of a directory listing (paged) for manual processing of individual
>  files.

But wouldn't you usually want the listing sorted, while os.listdir()
doesn't guarantee sorting? So you'd still have to read the entire
thing, sort it, and then display the selected page.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu May  1 17:58:22 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 08:58:22 -0700
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <48199954.4000800@gmail.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
	<481955B5.2030805@v.loewis.de> <48199954.4000800@gmail.com>
Message-ID: <ca471dc20805010858h417c2d7dra0caf524cbba77da@mail.gmail.com>

On Thu, May 1, 2008 at 3:20 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>  I think Giovanni's point is an important one as well - with an iterator,
> you can pipeline your operations far more efficiently, since you don't have
> to wait for the whole directory listing before doing anything (e.g. if
> you're doing some kind of move/rename operation on a directory, you can
> start copying the first file to its new location without having to wait for
> the directory read to finish).
>
>  Reducing the startup delays of an operation can be a very useful thing when
> it comes to providing a user with a good feeling of responsiveness from an
> application (and if it allows the application to more effectively pipeline
> something, there may be an actual genuine improvement in responsiveness,
> rather than just the appearance of one).

This sounds like optimizing for a super-rare case. And please do tell
me if you've timed this.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From aahz at pythoncraft.com  Thu May  1 18:11:07 2008
From: aahz at pythoncraft.com (Aahz)
Date: Thu, 1 May 2008 09:11:07 -0700
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <ca471dc20805010857h6f24a52cm219c55b06cae0a3a@mail.gmail.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
	<481955B5.2030805@v.loewis.de> <20080501142524.GA3546@panix.com>
	<ca471dc20805010857h6f24a52cm219c55b06cae0a3a@mail.gmail.com>
Message-ID: <20080501161106.GA13254@panix.com>

On Thu, May 01, 2008, Guido van Rossum wrote:
> On Thu, May 1, 2008 at 7:25 AM, Aahz <aahz at pythoncraft.com> wrote:
>>
>>  Actually, the primary application I'm thinking of is a CGI that displays
>>  part of a directory listing (paged) for manual processing of individual
>>  files.
> 
> But wouldn't you usually want the listing sorted, while os.listdir()
> doesn't guarantee sorting? So you'd still have to read the entire
> thing, sort it, and then display the selected page.

With hundreds of thousands of files, the sorting is done after filtering,
so reducing the memory consumed during the filter stage is still
extremely useful.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Help a hearing-impaired person: http://rule6.info/hearing.html

From ishimoto at gembook.org  Thu May  1 18:21:04 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Fri, 2 May 2008 01:21:04 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <ca471dc20804301034u626dc844m84bf06c4b8c57d40@mail.gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com>
	<4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru>
	<4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru>
	<480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<797440730804181935p1f618e90ob1b8b9efb48932c3@mail.gmail.com>
	<ca471dc20804301034u626dc844m84bf06c4b8c57d40@mail.gmail.com>
Message-ID: <797440730805010921r3d0b785bjb10e05d7aefc8d1e@mail.gmail.com>

On Thu, May 1, 2008 at 2:34 AM, Guido van Rossum <guido at python.org> wrote:
>  This should be done with a new function, not added to print. Once you
>  specify an encoding, you have to write to sys.stdout.buffer, which is
>  the underlying binary stream; but you'd have to flush the
>  TextIOWrapper and deal with incomplete codec state, and in general I
>  don't think it's a good idea.

Thank you for your comment. I'll reconsider this part in the PEP.

From martin at v.loewis.de  Thu May  1 18:33:05 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 18:33:05 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<dcbbbb410804151810pc251d6ay226959416ace2cdb@mail.gmail.com>	<ca471dc20804152155t14aadfeqb4b79f3055dccc19@mail.gmail.com>	<dcbbbb410804152216u1b9f646p92f2f4b2768f1e4e@mail.gmail.com>	<797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com>	<4805ECE1.6040501@gmail.com>
	<20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com>
	<20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4819F0C1.8050401@v.loewis.de>

> The problem is that this doesn't display the representation of strings
> and identifier names in an unambiguous way.  "AKMOT" could be
> all-ASCII, it could be all-Cyrillic, or it could be a mixture of
> ASCII, Cyrillic, and Greek.

I don't see this is a problem. Yes, it can happen, but no, it is not a
problem.

Unless I lost the thread, we are still talking about the repr() of
regular strings here, right?

> How about choosing a standard Python repertoire (based on the Unicode
> standard, of course) of which characters get a graphic repr and which
> ones get \u-escaped, and have a post-hook for repr which gets passed
> the string repr proposes to print out?

You mean, you only display the characters if they form a valid
identifier? That would not be good, since it would disallow display
of symbols.

Regards,
Martin

From martin at v.loewis.de  Thu May  1 18:38:53 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 18:38:53 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<4805ECE1.6040501@gmail.com>
	<20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com>
	<20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>
	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4819F21D.8070808@v.loewis.de>

>  > I think "standard repertoire based on Unicode" may be confusing the issue.
> 
> By "standard repertoire" I mean that all Pythons will show the same
> characters the same way, while "based on Unicode" is intended to mean
> looking at TR#36 and TR#39 in picking the repertoires.

I don't think either TR#36 or TR#39 are applicable here. This is not
identifier syntax; there may various symbols and whatnot in the
string, which should also be rendered as-is.

The escaping that repr() does is *not* to achieve unambiguity,
but to achieve printability.

Regards,
Martin

From guido at python.org  Thu May  1 18:41:07 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 09:41:07 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
Message-ID: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>

Some of you may have seen a video recorded in November 2006 where I
showed off Mondrian, a code review tool that I was developing for
Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped
that I could release Mondrian as open source, but it was not to be:
due to its popularity inside Google, it became more and more tied to
proprietary Google infrastructure like Bigtable, and it remained
limited to Perforce, the commercial revision control system most used
at Google.

What I'm announcing now is the next best thing: an code review tool
for use with Subversion, inspired by Mondrian and (soon to be)
released as open source. Some of the code is even directly derived
from Mondrian. Most of the code is new though, written using Django
and running on Google App Engine.

I'm inviting the Python developer community to try out the tool on the
web for code reviews. I've added a few code reviews already, but I'm
hoping that more developers will upload at least one patch for review
and invite a reviewer to try it out.

To try it out, go here:

    http://codereview.appspot.com

Please use the Help link in the top right to read more on how to use
the app. Please sign in using your Google Account (either a Gmail
address or a non-Gmail address registered with Google) to interact
more with the app (you need to be signed in to create new issues and
to add comments to existing issues).

Don't hesitate to drop me a note with feedback -- note though that
there are a few known issues listed at the end of the Help page. The
Help page is really a wiki, so feel free to improve it!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Thu May  1 19:12:07 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Thu, 01 May 2008 19:12:07 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	
	<dcbbbb410804152216u1b9f646p92f2f4b2768f1e4e@mail.gmail.com>	
	<797440730804160249j35a98e5xd9303fbe71430568@mail.gmail.com>	
	<4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru>	
	<4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru>	
	<480612CE.1010300@gmail.com>	
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	
	<4807C3C1.6010602@v.loewis.de>
	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
Message-ID: <4819F9E7.9040706@v.loewis.de>

> I still like this proposal. I don't quite understand the competing (?)
> proposal by Stephen Turnbull; perhaps Stephen can compare and contrast
> the two proposals? And where does Atsuo fall?

IIUC, Stephen proposes to use some of the "security" algorithms for
display, without (yet) specifying which one specifically.

I don't think they apply, as these algorithms are designed for
identifiers (in particular for use in programming languages and
domain names); any character classified as "confusing" would get
escaped. As Stephen elaborates, that would have the undesirable
side effect of escaping the Cyrillic A (i.e. ?), likewise for
some Greek letters. In any case, one would have to write a
precise specification first (UTR#36/#39 leave options), and probably
extend the tables in unicodedata.

Atsuo's latest proposal (http://wiki.python.org/moin/Python3kStringRepr)
is an elaboration of mine, I think. I would have phrased it slightly
differently, i.e.

- escaped are all Z* and C* characters, plus backslash, except space.
  In UCS-2 builds, half surrogates get escaped only if they don't occur
  as a pair.
- escaping looks like this:
  * \r, \n, \t, \\
  * \xXX for characters from Latin-1
  * \uXXXX for characters from the BMP
  * \U00XXXXXX for anything else

What I didn't have in my original proposal was escaping of Zs
except for space, which then would also escape NBSP, EN QUAD,
EM QUAD, THIN SPACE, HAIR SPACE, OGHAM SPACE MARK, etc. Escaping
them is fine also. Also, I didn't consider surrogate pairs in
UCS-2 builds originally; they should (of course) get represented
as-is.

The issue then is output of repr to a device, which may go wrong
in two ways:
- the device claims it supports the character, but doesn't actually
  have a glyph for it. In that case, the terminal encoding should
  be adjusted.
- the device cannot display certain characters in the repr. Here,
  an escaping error handler can be used if desired.

Regards,
Martin


From ishimoto at gembook.org  Thu May  1 19:16:26 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Fri, 2 May 2008 02:16:26 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <87mynazn05.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<20080416124529.GC8598@phd.pp.ru> <4805FD56.6070902@gmail.com>
	<20080416133046.GB16087@phd.pp.ru> <480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<4807C3C1.6010602@v.loewis.de>
	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
	<797440730804302006r49124aeco3876b6264698f65d@mail.gmail.com>
	<87mynazn05.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <797440730805011016s17e93375jb3f2d35e81c105bf@mail.gmail.com>

On Thu, May 1, 2008 at 1:06 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> atsuo ishimoto writes:
>
>   > > And where does Atsuo fall?
>   >
>   > Sorry, I cannot understand word 'fall', perhaps a colloquial expression?
>
>  In this case, it means "what is your opinion, compared to Stephen and
>  Martin?"

Oh, I see. Thank you. As I wrote, I think these proposals are not
competing, so I don't 'fall' to neither side.

In my PEP, I proposed to use Unicode properties based on proposal from
Michael and Martin. It's almost identical as written by Martin, but I
added Zs (Separator, Space) other than ASCII space('\x20'). This
category contains characters listed at end of this mail. I assume
these characters should be hex-escaped, although I know nothing about
these characters.

I think readability beats unambiguity for repr(), so I don't agree
Stephen's view that "repr is like quoted-printable encoding in MIME".
If the standard repertoire Stephen proposed is desired, the conversion
based on the repertoire should be done against strings repr()
produced. Such repertoire will be more useful if we have:

def standard_string(s):
    return _convert_ambiguous_chars(s)

print standard_string(repr(obj)), standard_string(sys.stdin.readline())

>  Great!  I'll take a look tomorrow or Friday.
>

Thank you. I'll looking forward your feedback.


Characters defined as Zs::
---------------------------------------------------------
0x20 SPACE
0xa0 NO-BREAK SPACE
0x1680 OGHAM SPACE MARK
0x2000 EN QUAD
0x2001 EM QUAD
0x2002 EN SPACE
0x2003 EM SPACE
0x2004 THREE-PER-EM SPACE
0x2005 FOUR-PER-EM SPACE
0x2006 SIX-PER-EM SPACE
0x2007 FIGURE SPACE
0x2008 PUNCTUATION SPACE
0x2009 THIN SPACE
0x200a HAIR SPACE
0x200b ZERO WIDTH SPACE
0x202f NARROW NO-BREAK SPACE
0x205f MEDIUM MATHEMATICAL SPACE
0x3000 IDEOGRAPHIC SPACE

From stephen at xemacs.org  Thu May  1 19:33:48 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 02 May 2008 02:33:48 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <4819F21D.8070808@v.loewis.de>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru>
	<4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru>
	<480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>
	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>
	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de>
Message-ID: <87y76u9ber.fsf@uwakimon.sk.tsukuba.ac.jp>

"Martin v. L?wis" writes:

 > The escaping that repr() does is *not* to achieve unambiguity,
 > but to achieve printability.

Well, if that is the case, then I withdraw my comments pretty much
entirely, and apologize for the noise.  I think you've already
specified what is needed to achieve printability correctly.


From martin at v.loewis.de  Thu May  1 19:31:20 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 19:31:20 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <87y76u9ber.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<4805ECE1.6040501@gmail.com>	<20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com>	<20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de>
	<87y76u9ber.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4819FE68.2010400@v.loewis.de>

>  > The escaping that repr() does is *not* to achieve unambiguity,
>  > but to achieve printability.
> 
> Well, if that is the case, then I withdraw my comments pretty much
> entirely, and apologize for the noise.  I think you've already
> specified what is needed to achieve printability correctly.

After I posted this, I read Guido's earlier message that the
case may not be as clear. So please take this as my own opinion,
not as a given - some people apparently want repr to provide
unambiguous output also.

If so, I still don't think the security mechanisms of Unicode
apply - if you have combining characters in the string, and
the precombined version also exists in Unicode, then those
algorithms may still not help. Likewise for compatibility
characters.

Regards,
Martin

From tjreedy at udel.edu  Thu May  1 19:49:37 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 1 May 2008 13:49:37 -0400
Subject: [Python-3000] Displaying strings containing unicode escapes
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<4805ECE1.6040501@gmail.com><20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com><20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com><87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de>
Message-ID: <fvcvrd$j4a$1@ger.gmane.org>


""Martin v. L?wis"" <martin at v.loewis.de> wrote in message 
news:4819F21D.8070808 at v.loewis.de...
|>  > I think "standard repertoire based on Unicode" may be confusing the 
issue.
| >
| > By "standard repertoire" I mean that all Pythons will show the same
| > characters the same way, while "based on Unicode" is intended to mean
| > looking at TR#36 and TR#39 in picking the repertoires.
|
| I don't think either TR#36 or TR#39 are applicable here. This is not
| identifier syntax; there may various symbols and whatnot in the
| string, which should also be rendered as-is.
|
| The escaping that repr() does is *not* to achieve unambiguity,
| but to achieve printability.

I agree with Martin that chasing 'unambiguity' is something of a chimera. 
Whether or not the glyphs for two Unicode chars are identical or not 
depends on the display system.  As I type these here, 1(one) and l (el) are 
barely distinguishable, depending on reading lens and distance.  Should one 
be excaped?  I think not.  I have had displays in which they are pixel for 
pixel identical, but also ones which made them clearly different. Ditto for 
0 (zero) and O (Oh).  A and <Alpha> *could* be made to look different on 
modern high-definition outputs.  I suspect they already have been or will 
be.

I think standard Python should somehow have two options: escape everything 
but ASCII (for unambuguity and old display systems) and escape nothing that 
is potentially printable (leaving partially capable systems to fare as they 
will).  In-between solutions will ultimately be programmer and system 
specific.

Terry Jan Reedy


From phd at phd.pp.ru  Thu May  1 19:56:32 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Thu, 1 May 2008 21:56:32 +0400
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <fvcvrd$j4a$1@ger.gmane.org>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>
	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de> <fvcvrd$j4a$1@ger.gmane.org>
Message-ID: <20080501175632.GA8293@phd.pp.ru>

On Thu, May 01, 2008 at 01:49:37PM -0400, Terry Reedy wrote:
> I think standard Python should somehow have two options: escape everything 
> but ASCII (for unambuguity and old display systems) and escape nothing that 
> is potentially printable (leaving partially capable systems to fare as they 
> will).  In-between solutions will ultimately be programmer and system 
> specific.

   +1

   repr() should not escape printable chars, and there should be a codec to
escape everything, so one could write mystring.encode("escape_string").

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From brett at python.org  Thu May  1 20:02:52 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 1 May 2008 11:02:52 -0700
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com>
Message-ID: <bbaeab100805011102l4d45750ewe605962ade512d33@mail.gmail.com>

On Thu, May 1, 2008 at 7:41 AM, Collin Winter <collinw at gmail.com> wrote:
>
> On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon <brett at python.org> wrote:

>  >  Transition Plan
>  >  ===============
>  >
>  >  For modules to be removed
>  >  -------------------------
>  >
>  >  For the removal of modules that are continuing to exist in the Python
>  >  2.x series (i.e., not deprecated explicitly in the 2.x series),
>  >  ``warnings.warn3k()`` will be used to issue a DeprecationWarning.
>
>  FYI, we can also flag these using 2to3.
>

I can't remember if we have a guiding rule on this yet, but if 2to3
can fix this, do we still want the warning? Obviously both names will
be provided so people can move their code over, but perhaps the
warning is not needed?

>
>  >  Renaming of modules
>  >  -------------------
>  >
>  >  For modules that are renamed, stub modules will be created with the
>  >  original names and be kept in a directory within the stdlib (e.g. like
>  >  how lib-old was once used).  The need to keep the stub modules within
>  >  a directory is to prevent naming conflicts with case-insensitive
>  >  filesystems in those cases where nothing but the case of the module
>  >  is changing.
>  >
>  >  These stub modules will import the module code based on the new
>  >  naming.  The same type of warning being raised by modules being
>  >  removed will be raised in the stub modules.
>  >
>  >  Support in the 2to3 refactoring tool for renames will also be used
>  >  [#2to3]_.  Import statements will be rewritten so that only the import
>  >  statement and none of the rest of the code needs to be touched.  This
>  >  will be accomplished by using the ``as`` keyword in import statements
>  >  to bind in the module namespace to the old name while importing based
>  >  on the new name.
>
>  You should cite the existing fix_imports fixer as one example of how
>  to do this: http://svn.python.org/view/sandbox/trunk/2to3/lib2to3/fixes/fix_imports.py?view=markup

Done.

-Brett

From facundobatista at gmail.com  Thu May  1 20:20:10 2008
From: facundobatista at gmail.com (Facundo Batista)
Date: Thu, 1 May 2008 15:20:10 -0300
Subject: [Python-3000] range() issues
In-Reply-To: <4818FC9B.1080809@gmail.com>
References: <1afaf6160804252025q3f36dba5h4164e50785a40926@mail.gmail.com>
	<5c6f2a5d0804291409re370ce2ifb913a6e1e4f1987@mail.gmail.com>
	<1afaf6160804291418s7723dd8cqb37e495a043a4723@mail.gmail.com>
	<ca471dc20804291648j4156facev48f73b268a9f9746@mail.gmail.com>
	<d38f5330804291916ve186314o831409fe1650dfad@mail.gmail.com>
	<ca471dc20804291936g7f02adces779126e71243f83a@mail.gmail.com>
	<1209582854.1924.7.camel@qrnik>
	<ca471dc20804301218u7da88147ieb00e691802db770@mail.gmail.com>
	<4818F773.4060809@canterbury.ac.nz> <4818FC9B.1080809@gmail.com>
Message-ID: <e04bdf310805011120n76006d3eo42d90ad4d1d3b366@mail.gmail.com>

2008/4/30, Nick Coghlan <ncoghlan at gmail.com>:

>  In the bug tracker, Alexander mentioned the possibility of removing
> __length__ and __getitem__ support from range() objects in py3k, and
> implementing only __length_hint__ instead (leaving range() as a bare-bones
> iterable). I'm starting to like that idea more and more.

+1

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From martin at v.loewis.de  Thu May  1 20:59:19 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 20:59:19 +0200
Subject: [Python-3000] gettext
In-Reply-To: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com>
References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com>
Message-ID: <481A1307.3000605@v.loewis.de>

> Are we going to want to keep the "u" variants of the gettext APIs
> around in 3.0? Also, the unicode parameters (for .install methods)
> don't make much sense in 3.0.
> 
> I don't see how we could remove them in 3.0, but perhaps rename then
> to their non-"u" variants and deprecate?

I think the new module should only support the Unicode API. gettext is
about text, i.e. character strings; there is no need for byte-oriented
APIs.

Regards,
Martin

From barry at python.org  Thu May  1 21:15:11 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 1 May 2008 15:15:11 -0400
Subject: [Python-3000] gettext
In-Reply-To: <87d4o7chho.fsf@physik.rwth-aachen.de>
References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com>
	<ca471dc20804241518q5a5a1978v150542bdc1eae122@mail.gmail.com>
	<35FDD892-1F6B-42DA-B5DB-FF5DC6992D46@python.org>
	<87d4o7chho.fsf@physik.rwth-aachen.de>
Message-ID: <ECFD7C89-1518-41C8-9712-41B4D514FE1A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Apr 30, 2008, at 2:41 PM, Torsten Bronger wrote:

> Indeed.  From today's perspective, I see no use case for getting
> human text snippets in byte strings encoded with the same encoding
> that just happened to be used in the .mo file, or with the
> "preferred system encoding".

Agreed.

> So it is only about the question how much hassle a
> renaming/deprecation generates for existing code.

Maybe we shouldn't be so worried about deprecation.  Py3 is a clean  
break, right?

>> That might argue for renaming ugettext() to gettext() and adding
>> something like a egettext() or bgettext() method.
>
> Okay.  But I think its not much advantage to have the "encoded"
> functions under new names, given that instead of renaming, you can
> also easily use ugettext to mimic their behaviour.

Works for me.

>> OTOH, the current names are inspired from GNU gettext so it seems
>> to me there's not much value in renaming our methods, except to
>> increase confusion and break backward compatibility <wink>.
>
> Well, this is hard to evaluate.  However, I think that if there is
> no danger of getting silent errors, then the module should switch to
> unicode, possibly even unicode-only.  After all, the results of
> gettext are likely to be passed to higher-level functions that use
> (or will switch to) unicode, too.
>
> As for "gettext" returning a unicode string: If clearly documented,
> I see not too much harm in using a different type scheme than C
> gettext; this should be acceptable in a reimplementation in another
> language.

Torsten, I agree.  Let's just rename ugettext() to gettext() and have  
it return unicodes.  That's the cleanest API we can do for Python.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSBoWv3EjvBPtnXfVAQJF1QQAr2G+UHqXkuckx9oREYpwsXhDhISy4pKJ
l3Ai+p+vlVsKIPiYn8HSuJYFRa8QIOBT5EOl6DEDMvQ78hYXu1VaLGWO5bOvnrjS
TtCeyM9xZuXWxB3StHO3ao8pK4VdBtljBsi+3vZ8br+4zZpOKQRwiMoWozqyq6u1
EwxFUwE19qI=
=936h
-----END PGP SIGNATURE-----

From barry at python.org  Thu May  1 21:15:59 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 1 May 2008 15:15:59 -0400
Subject: [Python-3000] gettext
In-Reply-To: <481A1307.3000605@v.loewis.de>
References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com>
	<481A1307.3000605@v.loewis.de>
Message-ID: <FDA60249-569A-45C0-BA15-F509F2393DB9@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 1, 2008, at 2:59 PM, Martin v. L?wis wrote:

>> Are we going to want to keep the "u" variants of the gettext APIs
>> around in 3.0? Also, the unicode parameters (for .install methods)
>> don't make much sense in 3.0.
>>
>> I don't see how we could remove them in 3.0, but perhaps rename then
>> to their non-"u" variants and deprecate?
>
> I think the new module should only support the Unicode API. gettext is
> about text, i.e. character strings; there is no need for byte-oriented
> APIs.

Sounds like you agree that we should just rename the u-variants and  
forget about deprecation, correct?

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSBoW8HEjvBPtnXfVAQJd6gP/TprSKI5X9Q5E8D1pqHU2iGB3yKRuJ+4H
fzjEEG5vX8Uk+JdaPR83FdwBlTMqtzZPNAKKZzjMJQr/u0a0y+M+JhHhQm6AzS5+
Pc6NFDsqW4HDQDhXVezCMwMK0G7+RRdL4bw+i0mtqiTRkXn0H/ImcM7CzCh7hsYz
FuXhiyTiXrQ=
=HuR0
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Thu May  1 21:28:58 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 01 May 2008 21:28:58 +0200
Subject: [Python-3000] gettext
In-Reply-To: <FDA60249-569A-45C0-BA15-F509F2393DB9@python.org>
References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com>
	<481A1307.3000605@v.loewis.de>
	<FDA60249-569A-45C0-BA15-F509F2393DB9@python.org>
Message-ID: <481A19FA.6050202@v.loewis.de>

> Sounds like you agree that we should just rename the u-variants and
> forget about deprecation, correct?

Exactly.

Regards,
Martin

From musiccomposition at gmail.com  Thu May  1 22:00:38 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 1 May 2008 15:00:38 -0500
Subject: [Python-3000] gettext
In-Reply-To: <ECFD7C89-1518-41C8-9712-41B4D514FE1A@python.org>
References: <1afaf6160804230732t266c8285la97a1fac62f96a8d@mail.gmail.com>
	<ca471dc20804241518q5a5a1978v150542bdc1eae122@mail.gmail.com>
	<35FDD892-1F6B-42DA-B5DB-FF5DC6992D46@python.org>
	<87d4o7chho.fsf@physik.rwth-aachen.de>
	<ECFD7C89-1518-41C8-9712-41B4D514FE1A@python.org>
Message-ID: <1afaf6160805011300x2724dcf7p90d697a7470e192e@mail.gmail.com>

On Thu, May 1, 2008 at 2:15 PM, Barry Warsaw <barry at python.org> wrote:
>  Torsten, I agree.  Let's just rename ugettext() to gettext() and have it
> return unicodes.  That's the cleanest API we can do for Python.

I have a patch for something like this at issue 2512.


-- 
Cheers,
Benjamin Peterson

From barry at python.org  Thu May  1 22:26:34 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 1 May 2008 16:26:34 -0400
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
Message-ID: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

This is a reminder that the LAST planned alpha releases of Python 2.6  
and 3.0 are scheduled for next Wednesday, 07-May-2008.  Please be  
diligent over the next week so that none of your changes break  
Python.  The stable buildbots look moderately okay, let's see what we  
can do about getting them all green:

http://www.python.org/dev/buildbot/stable/

We have a few showstopper bugs, and I will be looking at these more  
carefully starting next week.

http://bugs.python.org/issue?@columns=title,id,activity,versions,status&@sort=activity&@filter=priority,status&@pagesize=50&@startwith=0&priority=1&status=1&@dispname=Showstoppers

Time is running short to get any new features into Python 2.6 and  
3.0.  The release after this one is scheduled to be the first beta  
release, at which time we will institute a feature freeze.  If your  
feature doesn't make it in by then, you'll have to wait until  
2.7/3.1.  If there is something that absolutely must go into 2.6/3.0  
be sure that there is a bug issue open for it and that the Priority is  
set to 'release blocker'.  I may reduce it to critical for the next  
alpha, but we'll review all the release blocker and critical issues  
for the first 2.6 and 3.0 beta releases.

Cheers,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSBonfHEjvBPtnXfVAQLaSwP+IMjYbLryACRColvgTU4ezPHhbBpdDaRA
I2k15cLsqmkFwHitt9TaTlLklnZuETiEfl7pVzow20KW18Z2tWP5U5KVMrVVbrJM
9pMS/vC102FVD88ukyQcPP5q+pw2+r2qTLr3q/205zdELQlWo+Ny6ir6dAgTKOd4
/OZqgCMBHS4=
=MhWr
-----END PGP SIGNATURE-----

From lists at cheimes.de  Thu May  1 23:27:52 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 01 May 2008 23:27:52 +0200
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
Message-ID: <481A35D8.60604@cheimes.de>

Barry Warsaw schrieb:
> This is a reminder that the LAST planned alpha releases of Python 2.6
> and 3.0 are scheduled for next Wednesday, 07-May-2008.  Please be
> diligent over the next week so that none of your changes break Python. 
> The stable buildbots look moderately okay, let's see what we can do
> about getting them all green:

I like to draw some attention to two features for the last alpha:

PEP 370: Per user site-packages directory
http://www.python.org/dev/peps/pep-0370/

Alternative memory allocation for ints, floats and longs using PyMalloc
instead of the current block allocation. The issue has been discussed in
great length a few months ago but without a final decision.
http://bugs.python.org/issue2039

Christian

From tjreedy at udel.edu  Thu May  1 23:42:39 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 1 May 2008 17:42:39 -0400
Subject: [Python-3000] Invitation to try out open source code review tool
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
Message-ID: <fvddga$gf$1@ger.gmane.org>

As I understood this,one needs a diff to comment on.
I can imagine wanting, or wanting others, to be able to comment on a file 
or lines of files without making a fake diff (of the file versus itself or 
a blank file). Then only one column would be needed.

I presume the current site is for trial purposes.  You obviously don't want 
hundreds of repositories listed.  Are you planning, for instance, to 
suggest that Google project hosting add a Review tab or link to the project 
pages?

And I followed the link to pages about Rietveld ;-)

tjr


From guido at python.org  Fri May  2 00:41:14 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 15:41:14 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <fvddga$gf$1@ger.gmane.org>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<fvddga$gf$1@ger.gmane.org>
Message-ID: <ca471dc20805011541y63dd132eo6e67310eaeea3ffa@mail.gmail.com>

On Thu, May 1, 2008 at 2:42 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> As I understood this,one needs a diff to comment on.
>  I can imagine wanting, or wanting others, to be able to comment on a file
>  or lines of files without making a fake diff (of the file versus itself or
>  a blank file). Then only one column would be needed.

Yeah, this use case is not well supported. In my experience with the
internal tool at Google, I don't think that anybody has ever requested
that feature, so perhaps in practice it's not so common. I mean, who
wants to review a 5000-line file once it's checked in? :-) The right
point for such a review (certainly this is the case at Google) is when
it goes in.

>  I presume the current site is for trial purposes.

Actually I'm hoping to keep it alive forever, just evolving the
functionality based on feedback.

>  You obviously don't want
>  hundreds of repositories listed.

Repository management is a bit of an open problem. Fortunately, when
you use upload.py, you don't need to have a repository listed --
upload.py will specify the correct base URL, especially for
repositories hosted at Google. (I should probably figure out how to
support SourceForge as well...)

>  Are you planning, for instance, to
>  suggest that Google project hosting add a Review tab or link to the project
>  pages?

They've been following my release with interest...

>  And I followed the link to pages about Rietveld ;-)

Thanks. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Fri May  2 01:24:24 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 1 May 2008 19:24:24 -0400
Subject: [Python-3000] Invitation to try out open source code review tool
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com><fvddga$gf$1@ger.gmane.org>
	<ca471dc20805011541y63dd132eo6e67310eaeea3ffa@mail.gmail.com>
Message-ID: <fvdjf3$hda$1@ger.gmane.org>


"Guido van Rossum" <guido at python.org> wrote in message 
news:ca471dc20805011541y63dd132eo6e67310eaeea3ffa at mail.gmail.com...
| On Thu, May 1, 2008 at 2:42 PM, Terry Reedy <tjreedy at udel.edu> wrote:
| > As I understood this,one needs a diff to comment on.
| >  I can imagine wanting, or wanting others, to be able to comment on a 
file
| >  or lines of files without making a fake diff (of the file versus 
itself or
| >  a blank file). Then only one column would be needed.
|
| Yeah, this use case is not well supported. In my experience with the
| internal tool at Google, I don't think that anybody has ever requested
| that feature, so perhaps in practice it's not so common. I mean, who
| wants to review a 5000-line file once it's checked in? :-) The right
| point for such a review (certainly this is the case at Google) is when
| it goes in.

I am thinking of an entirely different scenario: a package of modules that 
are maybe a few hundred lines each and that accompany a book and are meant 
for human reading as much or more than for machine execution.

Or this: 15 minutes ago I was reading a PEP and discovered that a link did 
not work.  So I find the non-clickable author email at the top and notify 
the author with my email program.  But how much nicer to double click an 
adjacent line and stick the comment in place (and let your system do the 
emailing).  (I presume the sponsor of an item in your system can remove 
no-longer-needed comments.)  So I guess I am thinking of your system as one 
for collaborative online editing rather than just patch review.

Terry


From guido at python.org  Fri May  2 01:29:13 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 16:29:13 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <fvdjf3$hda$1@ger.gmane.org>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<fvddga$gf$1@ger.gmane.org>
	<ca471dc20805011541y63dd132eo6e67310eaeea3ffa@mail.gmail.com>
	<fvdjf3$hda$1@ger.gmane.org>
Message-ID: <ca471dc20805011629n2ef1185cl8fee1139f23ab6c4@mail.gmail.com>

On Thu, May 1, 2008 at 4:24 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>
>  "Guido van Rossum" <guido at python.org> wrote in message
>  news:ca471dc20805011541y63dd132eo6e67310eaeea3ffa at mail.gmail.com...
>
> | On Thu, May 1, 2008 at 2:42 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>  | > As I understood this,one needs a diff to comment on.
>  | >  I can imagine wanting, or wanting others, to be able to comment on a
>  file
>  | >  or lines of files without making a fake diff (of the file versus
>  itself or
>  | >  a blank file). Then only one column would be needed.
>  |
>  | Yeah, this use case is not well supported. In my experience with the
>  | internal tool at Google, I don't think that anybody has ever requested
>  | that feature, so perhaps in practice it's not so common. I mean, who
>  | wants to review a 5000-line file once it's checked in? :-) The right
>  | point for such a review (certainly this is the case at Google) is when
>  | it goes in.
>
>  I am thinking of an entirely different scenario: a package of modules that
>  are maybe a few hundred lines each and that accompany a book and are meant
>  for human reading as much or more than for machine execution.
>
>  Or this: 15 minutes ago I was reading a PEP and discovered that a link did
>  not work.  So I find the non-clickable author email at the top and notify
>  the author with my email program.  But how much nicer to double click an
>  adjacent line and stick the comment in place (and let your system do the
>  emailing).  (I presume the sponsor of an item in your system can remove
>  no-longer-needed comments.)  So I guess I am thinking of your system as one
>  for collaborative online editing rather than just patch review.

I agree that those are all great use cases. Eventually we'll be able
to support these; right now though, I'd like to focus on the more
immediate need (IMO) of patch reviews.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ndbecker2 at gmail.com  Fri May  2 01:37:55 2008
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 01 May 2008 19:37:55 -0400
Subject: [Python-3000] Invitation to try out open source code review tool
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
Message-ID: <fvdk8j$iuh$2@ger.gmane.org>

It would be really nice to see support for some other backends, such as Hg
or bzr (which are both written in python), in addition to svn.


From guido at python.org  Fri May  2 01:45:01 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 16:45:01 -0700
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <481A35D8.60604@cheimes.de>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
Message-ID: <ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>

On Thu, May 1, 2008 at 2:27 PM, Christian Heimes <lists at cheimes.de> wrote:
> Barry Warsaw schrieb:
>
> > This is a reminder that the LAST planned alpha releases of Python 2.6
>  > and 3.0 are scheduled for next Wednesday, 07-May-2008.  Please be
>  > diligent over the next week so that none of your changes break Python.
>  > The stable buildbots look moderately okay, let's see what we can do
>  > about getting them all green:
>
>  I like to draw some attention to two features for the last alpha:
>
>  PEP 370: Per user site-packages directory
>  http://www.python.org/dev/peps/pep-0370/

I like this, except one issue: I really don't like the .local
directory. I don't see any compelling reason why this needs to be
~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide
it from view, especially since the user is expected to manage this
explicitly.

>  Alternative memory allocation for ints, floats and longs using PyMalloc
>  instead of the current block allocation. The issue has been discussed in
>  great length a few months ago but without a final decision.
>  http://bugs.python.org/issue2039

I might look at this later; but it seems to me to be a pure
optimization and thus not required to be in before the first beta.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri May  2 01:45:33 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 16:45:33 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <fvdk8j$iuh$2@ger.gmane.org>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<fvdk8j$iuh$2@ger.gmane.org>
Message-ID: <ca471dc20805011645s239ec048l8d865703f065ef5d@mail.gmail.com>

On Thu, May 1, 2008 at 4:37 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
> It would be really nice to see support for some other backends, such as Hg
>  or bzr (which are both written in python), in addition to svn.

Once it's open source feel free to add those!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From adlaiff6 at gmail.com  Fri May  2 01:47:51 2008
From: adlaiff6 at gmail.com (Leif Walsh)
Date: Thu, 1 May 2008 19:47:51 -0400 (EDT)
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <fvdk8j$iuh$2@ger.gmane.org>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<fvdk8j$iuh$2@ger.gmane.org>
Message-ID: <Pine.LNX.4.64.0805011947250.19044@lappy>

On Thu, 1 May 2008, Neal Becker wrote:
> It would be really nice to see support for some other backends, such as Hg
> or bzr (which are both written in python), in addition to svn.

/me starts the clamour for git

-- 
Cheers,
Leif

From barry at python.org  Fri May  2 01:54:40 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 1 May 2008 19:54:40 -0400
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
Message-ID: <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 1, 2008, at 7:45 PM, Guido van Rossum wrote:

> On Thu, May 1, 2008 at 2:27 PM, Christian Heimes <lists at cheimes.de>  
> wrote:
>> Barry Warsaw schrieb:
>>
>>> This is a reminder that the LAST planned alpha releases of Python  
>>> 2.6
>>> and 3.0 are scheduled for next Wednesday, 07-May-2008.  Please be
>>> diligent over the next week so that none of your changes break  
>>> Python.
>>> The stable buildbots look moderately okay, let's see what we can do
>>> about getting them all green:
>>
>> I like to draw some attention to two features for the last alpha:
>>
>> PEP 370: Per user site-packages directory
>> http://www.python.org/dev/peps/pep-0370/
>
> I like this, except one issue: I really don't like the .local
> directory. I don't see any compelling reason why this needs to be
> ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide
> it from view, especially since the user is expected to manage this
> explicitly.

Interesting.  I'm of the opposite opinion.  I really don't want Python  
dictating to me what my home directory should look like (a dot file  
doesn't count because so many tools conspire to hide it from me).  I  
guess there's always $PYTHONUSERBASE, but I think I will not be  
alone. ;)

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSBpYQ3EjvBPtnXfVAQLY+AP/dy7qoQKNEJiKtlwqCtw7LUCMLMQylBX8
DfbIonOnAaKHzjveyswuxVeAEq/C/fxssOGMhyd++H/1koJHjBdIHp47+RgohbHQ
1xCyA6Qj8f6xM3xdCR7lRuIDdjb6Tb/iCIQT/dHLrYxEf+VGUC+xVa3JIXdfJu4s
kUYg7tU8SQ8=
=xJWG
-----END PGP SIGNATURE-----

From guido at python.org  Fri May  2 03:55:56 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 18:55:56 -0700
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
Message-ID: <ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>

On Thu, May 1, 2008 at 5:03 PM,  <glyph at divmod.com> wrote:
> On 11:45 pm, guido at python.org wrote:
>
> > I like this, except one issue: I really don't like the .local
> > directory. I don't see any compelling reason why this needs to be
> > ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide
> > it from view, especially since the user is expected to manage this
> > explicitly.
> >
>
>  I've previously given a spirited defense of ~/.local on this list (
> http://mail.python.org/pipermail/python-dev/2008-January/076173.html ) among
> other places.
>
>  Briefly, "lib" is not the only directory participating in this convention;
> you've also got the full complement of other stuff that might go into an
> installation like /usr/local.  So, while "lib" might annoy me a little, "bin
> etc games include lib lib32 man sbin share src" is going to get ugly pretty
> fast, especially if this is what comes up in Finder or Nautilus or Explorer
> every time I open a window.

Unless I misread the PEP, there's only going to be a lib subdirectory.
Python packages don't put stuff in other places AFAIK.

On the Mac, the default Finder window is not your home directory but
your Desktop, which is a subdirectory thereof with a markedly public
name. In fact, OS X has a whole bunch of reserved names in your home
directory, and none of them start with a dot. The rule seems to be
that if it contains stuff that the user cares about, it doesn't start
with a dot.

> If it's going to be a visible directory on the
> grounds that this is a Python- specific thing that is explicitly *not*
> participating in a convention with other software, then please call it
> "~/Python" or something.

Much better than ~/.local/ IMO.

>  Am I the only guy who finds software that insists on visible, fixed files
> in my home directory rude?  vmware, for example, wants a "~/vmware"
> directory, but pretty much every other application I use is nice enough to
> use dotfiles (even cedega, with a roughly-comparable-to- lib "applications
> I've installed for you" folder).

The distinction to my mind is that most dot files (with the exception
of a few like .profile or .bashrc) are not managed by most users --
the apps that manage them provide an APIs for manipulating their
contents.  (Sort of like thw Windows registry.)  Non-dot files are for
stuff that the user needs to be aware of.

I'm not sure where Python packages fall, but ISTM that this is
something a user must explicitly choose as the target of an installer.
The user is also likely to have to dig through there to remove stuff,
as Python package management doesn't have a way to remove packages.

>  Put another way - it's trivial to make ~/.local/lib show up by symlinking
> ~/lib,

That's not the same thing at all.

> but you can't make ~/lib disappear, and lots of software ends up
> looking at ~.

But what software cares about another file there? My home directory is
mostly a switching point where I have quick access to everything I
access regularly.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Fri May  2 04:31:20 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 1 May 2008 19:31:20 -0700
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
Message-ID: <bbaeab100805011931i29e497e3vdb67b268644d9357@mail.gmail.com>

On Thu, May 1, 2008 at 1:26 PM, Barry Warsaw <barry at python.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
>  Hash: SHA1
>
>  This is a reminder that the LAST planned alpha releases of Python 2.6 and
> 3.0 are scheduled for next Wednesday, 07-May-2008.  Please be diligent over
> the next week so that none of your changes break Python.  The stable
> buildbots look moderately okay, let's see what we can do about getting them
> all green:
>
>  http://www.python.org/dev/buildbot/stable/
>
>  We have a few showstopper bugs, and I will be looking at these more
> carefully starting next week.
>
>
> http://bugs.python.org/issue?@columns=title,id,activity,versions,status&@sort=activity&@filter=priority,status&@pagesize=50&@startwith=0&priority=1&status=1&@dispname=Showstoppers
>
>  Time is running short to get any new features into Python 2.6 and 3.0.  The
> release after this one is scheduled to be the first beta release, at which
> time we will institute a feature freeze.  If your feature doesn't make it in
> by then, you'll have to wait until 2.7/3.1.  If there is something that
> absolutely must go into 2.6/3.0 be sure that there is a bug issue open for
> it and that the Priority is set to 'release blocker'.  I may reduce it to
> critical for the next alpha, but we'll review all the release blocker and
> critical issues for the first 2.6 and 3.0 beta releases.

I just closed the release blocker I created (the
backwards-compatibility issue with warnings.showwarning() ). I would
like to add a PendingDeprecationWarning (or stronger) to 2.6 for
showwarning() implementations that don't support the optional 'line'
argument. I guess the best way to do it in C code would be to see if
PyFunction_GetDefaults() returns a tuple of length two (since
showwarning() already has a single optional argument as it is).

Anyone have an issue with me doing this? Is PendingDeprecationWarning
safe enough for 2.6? Or should this be a 3.0-only thing with a
DeprecationWarning?

-Brett

From musiccomposition at gmail.com  Fri May  2 04:35:12 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 1 May 2008 21:35:12 -0500
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <bbaeab100805011931i29e497e3vdb67b268644d9357@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<bbaeab100805011931i29e497e3vdb67b268644d9357@mail.gmail.com>
Message-ID: <1afaf6160805011935k6dc045bexd6fb54f112f2307@mail.gmail.com>

On Thu, May 1, 2008 at 9:31 PM, Brett Cannon <brett at python.org> wrote:
>
>  I just closed the release blocker I created (the
>  backwards-compatibility issue with warnings.showwarning() ). I would
>  like to add a PendingDeprecationWarning (or stronger) to 2.6 for
>  showwarning() implementations that don't support the optional 'line'
>  argument. I guess the best way to do it in C code would be to see if
>  PyFunction_GetDefaults() returns a tuple of length two (since
>  showwarning() already has a single optional argument as it is).
>
>  Anyone have an issue with me doing this? Is PendingDeprecationWarning
>  safe enough for 2.6? Or should this be a 3.0-only thing with a
>  DeprecationWarning?

I vote for a full DeprecationWarning.
>
>  -Brett


-- 
Cheers,
Benjamin Peterson

From g.brandl at gmx.net  Fri May  2 05:28:19 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 02 May 2008 05:28:19 +0200
Subject: [Python-3000] Problems with the new super()
In-Reply-To: <ca471dc20805011255i72d1a897y7385f3aff4ec18f2@mail.gmail.com>
References: <loom.20080430T210749-341@post.gmane.org>	<ca471dc20804301457m56c41a46w6cb394f9bf34bdd7@mail.gmail.com>	<loom.20080501T081959-316@post.gmane.org>
	<fvd1lq$pe1$1@ger.gmane.org>
	<ca471dc20805011255i72d1a897y7385f3aff4ec18f2@mail.gmail.com>
Message-ID: <fve1ol$ghd$1@ger.gmane.org>

Guido van Rossum schrieb:
> On Thu, May 1, 2008 at 11:20 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>>  But the other two magical things about super() really bother me too. I
>>  haven't looked at the new super in detail so far (and I don't know how
>>  many others have), and two things are really strikingly unpythonic in
>>  my view:
>>
>>  * super() only works when named "super" [1]. It shouldn't be a function if
>>   it has that property; no other Python function has that.
> 
> Actually, I believe IronPython and/or Jython have to use this trick in
> certain cases -- at least I recall Jim Hugunin talking about
> generating different code when the use of locals() was detected.

I don't know if it's possible in Jython to have "locals" referring to
something else. For CPython, the name "super" in a function can refer to
anything -- local, global or builtin -- and it just feels wrong for the
compiler to make assumptions based on the mere mention of a non-reserved
name.

> I'm not proud of this, but I don't see a way around it. The
> alternative would be to make it a keyword, which seemed excessive
> (plus, it would be odd if super() were a keyword when self is not).

I don't find it odd. In fact, IMO the whole magic needed for the runtime
implementation of "super()" justifies super becoming a keyword.

Georg

[Moving this to the Python-3000 list]

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From guido at python.org  Fri May  2 05:49:18 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 20:49:18 -0700
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
Message-ID: <ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>

I stand corrected on a few points. You've convinced me that ~/lib/ is
wrong. But I still don't like ~/.local/; not in the last place because
it's not any more local than any other dot files or directories. The
"symmetry" with /usr/local/ is pretty weak, and certainly won't help
beginning users.

As a compromise, I'm okay with ~/Python/. I would like to be able to
say that the user explicitly has to set an environment variable in
order to benefit from this feature, just like with $PYTHONPATH and
$PYTHONSTARTUP. But that might defeat the point of making this easy to
use for noobs.

On OS X I think we should put this somewhere under ~/Library/. Just
put it in a different place than where the Python framework puts its
stuff.

On Thu, May 1, 2008 at 8:25 PM,  <glyph at divmod.com> wrote:
> On 01:55 am, guido at python.org wrote:
>
> > On Thu, May 1, 2008 at 5:03 PM,  <glyph at divmod.com> wrote:
> >
>
>  Hi everybody.  I apologize for writing yet another lengthy screed about a
> simple directory naming issue.  I feel strongly about it but I encourate
> anyone who doesn't to simply skip it.
>
>  First, some background: my strong feelings here are actually based on an
> experience I had a long time ago when helping someone with some C++
> programming homework.  They were baffled because when I helped them the
> programs compiled, but then as soon as they tried it on their own it didn't.
> The issue was that I had replicated my own autotools-friendly directory
> structure for them (at the time, "~/bin", "~/include", "~/lib", "~/etc", and
> so on managed with GNU stow) onto their machine and edited their shell setup
> to include them appropriately.  But, as soon as I was finished, they
> "cleaned up" the "mess" I had left behind, and thereby removed all of their
> build dependencies.  This was on a shared university build server, before
> the days of linux as a friendly, graphical operating system which encouraged
> you to look even more frequently at your home directory, so if anything I
> suspect the likelihood that this is a problem would be worse now.  Since
> cleaning up my own home directory, of course, I find that I appreciate the
> lack of visual noise in Nautilus et. al. as well.
>
>  Also, while I obviously think all tools should work this way, I think that
> Python in particular will attract an audience who is learning to program but
> not necessarily savvy with arcane nuances of filesystem layout, and it would
> be best if those details were abstracted.
>
>  My concern here is for the naive python developer reading installation
> instructions off of a wiki and trying to get started with Twisted
> development.  Seeing a directory created in your home directory (or, as the
> case may be, 3 directories, "bin", "lib", and "include") is a bit of a
> surprise.  They don't actually care where the files in their installed
> library are, as long as they're "installed", and they can import them.
> However, they may care that clicking on the little house icon now shows not
> "Pictures", "Movies", etc, but "lib" (what's a 'lib'?) "bin" (what's a bin?
> is that like a box where I throw my stuff?) "share" (I put my stuff in
> "share", but it's not shared.  Wait, I'm supposed to put it in "Public"?).
>
>
> >
> > >  Briefly, "lib" is not the only directory participating in this
> convention;
> > > you've also got the full complement of other stuff that might go into an
> > > installation like /usr/local.  So, while "lib" might annoy me a little,
> "bin
> > > etc games include lib lib32 man sbin share src" is going to get ugly
> pretty
> > > fast, especially if this is what comes up in Finder or Nautilus or
> Explorer
> > > every time I open a window.
> > >
> >
> > Unless I misread the PEP, there's only going to be a lib subdirectory.
> > Python packages don't put stuff in other places AFAIK.
> >
>
>  Python packages, at the very least, frequently put stuff in "bin" (or
> "scripts", I think, on Windows).  Not all Python packages are pure- Python
> packages either; setup.py boasts --install-platlib, --install- headers,
> --install-data, and --exec-prefix options, which suggests an "include",
> "bin", and "share" directory, at least.  I'm sure if I had more time to
> grovel around I'd find one that installed manpages. Twisted has some, but
> apparently setup.py doesn't do anything with them, we leave that to the OS
> packages...
>
>  Of course, very little of this is handled by the PEP.  But even the usage
> of the name "lib" implies that the PEP is taking some care to be compatible
> with an idiom that goes beyond Python itself here, or at least beyond simple
> Python packages.
>
>  Even assuming that no Python library ever wanted to install any of these
> things, there are many Python libraries which are simply wrappers around
> lower-level libraries, and if I want to perform a per-user install of one of
> those, I am going to ./configure --prefix=~/something (and by "something", I
> mean ".local" ;)) and it would be nice to have Python living in the same
> space.  For that matter it'd be nice to get autotools and Ruby and PHP and
> Perl and Emacs (ad nauseum) all looking at ~/.local as a mirror of /usr, so
> that I didn't have to write a bunch of shell bootstrap glue to get
> everything to behave consistently, or learn the new, special names for bits
> of configuration under "~" that are different from the ones under /usr/local
> or /etc.
>
>  I replicate a consistent Python development environment with a ton of
> bizarre dependencies across something like 15 different OS installations
> (not to mention a bevy of virtual machines I keep around just for fun), so I
> think about these issues a lot.  Most of these machines are macs and linux
> boxes, but I do my best on Windows too.  FWIW I don't have any idea what the
> right thing to do is on Windows; ".local" doesn't particularly make sense,
> but neither does "lib" in that context. There's no reasonable guess as to
> where to put scripts, or dependent shared libraries... but then, per-user
> installation is less of an issue on Windows.
>
>
> > On the Mac, the default Finder window is not your home directory but
> > your Desktop, which is a subdirectory thereof with a markedly public
> > name. In fact, OS X has a whole bunch of reserved names in your home
> > directory, and none of them start with a dot. The rule seems to be
> > that if it contains stuff that the user cares about, it doesn't start
> > with a dot.
> >
>
>  Hmm.  On my Mac laptop, the default Finder window is definitely my home
> directory; this may be an artifact of many OS upgrades or some tweak that I
> performed a long time ago and forgot about, though.  Apologies if that is
> not the average user experience.
>
>  For what it's worth, Ubuntu also has some directories that it creates:
> Desktop, Pictures, Documents, Examples, Templates, Videos.  These are empty,
> and I typically delete the ones I don't use.
>
>
> >
> > > If it's going to be a visible directory on the
> > > grounds that this is a Python- specific thing that is explicitly *not*
> > > participating in a convention with other software, then please call it
> > > "~/Python" or something.
> > >
> >
> > Much better than ~/.local/ IMO.
> >
>
>  It depends how this is being perceived.  If this is Python mirroring the
> /usr/local layout convention for users, as the name "lib" implies, then this
> is worse.  However, if Python is just trying to select a location for its
> own library bookkeeping and not allow the installation of platform libraries
> or scripts using this mechanism... well, ~/.python.d would still be my
> preference ;-) but I could at least understand "Python" as mirroring the
> Mac, GNOME and KDE convention for a few very special directories.
>
>
> >
> > >  Am I the only guy who finds software that insists on visible, fixed
> files
> > > in my home directory rude?  vmware, for example, wants a "~/vmware"
> > > directory, but pretty much every other application I use is nice enough
> to
> > > use dotfiles (even cedega, with a roughly-comparable-to- lib
> "applications
> > > I've installed for you" folder).
> > >
> >
> > The distinction to my mind is that most dot files (with the exception
> > of a few like .profile or .bashrc) are not managed by most users --
> > the apps that manage them provide an APIs for manipulating their
> > contents.  (Sort of like thw Windows registry.)  Non-dot files are for
> > stuff that the user needs to be aware of.
> >
>
>  My experience of modern Linux suggests that the usage you're describing is
> gradually being phased out - applications that want to manage some
> non-user-visible storage in something like the registry increasingly use
> gconf (or a database, in server-land).  Granted, gconf itself is stored in
> dotfiles, but it's just a few.
>
>  In my home directory I have, in version control, variously written by hand
> or databases maintained from externally downloaded stuff:
>
>    ~/.asoundrc
>    ~/.emacs
>    ~/.vimrc
>    ~/.vim
>    ~/.Xresources
>    ~/.fonts
>    ~/.gnomerc
>    ~/.inputrc
>    ~/.bashrc
>    ~/.bash_profile
>    ~/.profile
>    ~/.screenrc
>    ~/.Xresources
>    ~/.ssh/config
>    ~/.ssh/authorized_keys
>    ~/.ssh/known_hosts
>
>  I know about these dot files and I care about them and I maintain them, but
> they're there for the benefit of particular pieces of software, not me.
> There are a lot of other dotfiles there, but I don't think that this set is
> "a few";   I am quite happy that I don't have to see every one of them every
> time I am looking at my home directory in a "save as" dialog.
>
>
> > I'm not sure where Python packages fall, but ISTM that this is
> > something a user must explicitly choose as the target of an installer.
> > The user is also likely to have to dig through there to remove stuff,
> > as Python package management doesn't have a way to remove packages.
> >
>
>  I hope that users never have to explicitly choose this as the target of the
> installer; I was under the impression that the point of adding this feature
> was to allow the default behavior of distutils to work simply and
> automatically on UNIX-y platforms rather than puking about permissions, or
> requiring arcana like  "sudo" access or editing your shell's startup.  I am
> quietly agitating elsewhere to get ~/.local/bin added to $PATH by default,
> by the way ;-).  (~/.local/lib on $LD_LIBRARY_PATH is a hard sell, but that
> too...)
>
>  Once you have to know about it and explicitly choose it it's not much more
> work to set all the appropriate shell environment variables yourself.  And,
> for that matter, *I* already have, so I suppose regardless of the outcome of
> this discussion I'll still have a ~/.local :-).
>
>
> >
> > >  Put another way - it's trivial to make ~/.local/lib show up by
> symlinking
> > > ~/lib,
> > >
> >
> > That's not the same thing at all.
> >
>
>  I'm not sure what you're saying it's not the same as.  All I'm saying is
> that if advanced users want to show it, they'll symlink it; if naive users
> want to hide it, they'll delete it and break python, possibly without
> knowing why ;).
>
>
> >
> > > but you can't make ~/lib disappear, and lots of software ends up
> > > looking at ~.
> > >
> >
> > But what software cares about another file there? My home directory is
> > mostly a switching point where I have quick access to everything I
> > access regularly.
> >
>
>  Nothing's going to break, if that's what you mean.  No software processes
> the list of ~ and does anything with it; but lots of stuff shows me that
> list.  In GNOME, on Ubuntu, when a "choose file" dialog comes up, 80% of the
> time it comes up by default in my home directory. When I open a terminal it
> opens in my home directory.  The default location for Emacs is my home
> directory.  I can quickly measure my cognitive load by looking at the
> contents of that directory.  Since my shell starts there, autocomplete
> starts there, and so common-letter real estate is scarce.  I have a
> directory called "Projects" that I currently autocomplete with 'p<tab>' and
> a directory called 'Linux' that I autocomplete with 'l<tab>'; either
> public-name proposal will have me typing an additional letter on these every
> day ;-).
>
>  In other words, I care about another file there.  I use my home directory
> as a sort of to-do list; it's mostly empty unless I have a lot going on, in
> which case it fills up with various objects I'm working on, and then I empty
> it out again.  There are a few exceptions to this rule; on every platform
> there are a few things the OS puts there, but they are generally things like
> "Pictures", "Desktop", and "Music"... where I put pictures, downloaded
> files, and music. The Mac's "Library" directory has never bothered me, since
> it's OS-provided and basically an alternate location for dotfiles.
> ("Application Data" and friends are another story.)
>
>  In a way, I agree with you.  "everything I access regularly" is a good
> description of my home directory.  Except, this "lib" directory is not
> something I want to access regularly; very occasionally, maybe once every
> few weeks, I want to chuck some dependency in there and then forget about it
> for a year.
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri May  2 05:54:33 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 1 May 2008 20:54:33 -0700
Subject: [Python-3000] Problems with the new super()
In-Reply-To: <fve1ol$ghd$1@ger.gmane.org>
References: <loom.20080430T210749-341@post.gmane.org>
	<ca471dc20804301457m56c41a46w6cb394f9bf34bdd7@mail.gmail.com>
	<loom.20080501T081959-316@post.gmane.org> <fvd1lq$pe1$1@ger.gmane.org>
	<ca471dc20805011255i72d1a897y7385f3aff4ec18f2@mail.gmail.com>
	<fve1ol$ghd$1@ger.gmane.org>
Message-ID: <ca471dc20805012054y6595922pcfbe7ec9ba8b32c3@mail.gmail.com>

This whole movement to condemn super because it's not "pure" strikes
me as wasted energy. That's my last word.

On Thu, May 1, 2008 at 8:28 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum schrieb:
>
>
> > On Thu, May 1, 2008 at 11:20 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> >
> > >  But the other two magical things about super() really bother me too. I
> > >  haven't looked at the new super in detail so far (and I don't know how
> > >  many others have), and two things are really strikingly unpythonic in
> > >  my view:
> > >
> > >  * super() only works when named "super" [1]. It shouldn't be a function
> if
> > >  it has that property; no other Python function has that.
> > >
> >
> > Actually, I believe IronPython and/or Jython have to use this trick in
> > certain cases -- at least I recall Jim Hugunin talking about
> > generating different code when the use of locals() was detected.
> >
>
>  I don't know if it's possible in Jython to have "locals" referring to
>  something else. For CPython, the name "super" in a function can refer to
>  anything -- local, global or builtin -- and it just feels wrong for the
>  compiler to make assumptions based on the mere mention of a non-reserved
>  name.
>
>
>
> > I'm not proud of this, but I don't see a way around it. The
> > alternative would be to make it a keyword, which seemed excessive
> > (plus, it would be odd if super() were a keyword when self is not).
> >
>
>  I don't find it odd. In fact, IMO the whole magic needed for the runtime
>  implementation of "super()" justifies super becoming a keyword.
>
>  Georg
>
>  [Moving this to the Python-3000 list]
>
>
>  --
>  Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
>  Four shall be the number of spaces thou shalt indent, and the number of thy
>  indenting shall be four. Eight shalt thou not indent, nor either indent
> thou
>  two, excepting that thou then proceed to four. Tabs are right out.
>
>  _______________________________________________
>  Python-3000 mailing list
>  Python-3000 at python.org
>  http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mhammond at skippinet.com.au  Fri May  2 07:57:31 2008
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Fri, 2 May 2008 15:57:31 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <20080502050720.GO78165@nexus.in-nomine.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<20080502050720.GO78165@nexus.in-nomine.org>
Message-ID: <05bb01c8ac19$6db816c0$49284440$@com.au>

> Is there a reliable way to identify 32-bits and 64-bits Windows from
> within Python?

Not that I'm aware of.  'sys.platform=="win32" and "64 bits" in sys.version' will be reliable when it returns True, but it might be wrong when it returns False (although when it returns False, things will look a lot like a 32bit OS).

The best way I can find for the win32 API to tell you this is a combination of the above and the IsWow64Process() (which returns True if you are a 32bit process on a 64bit platform)

I'd be interested to know why you care though - ie, how will the behavior of your programs depend on that?  The virtualization compatibility hacks which, best I can tell are currently enabled for Python mean that the answer to the question might not be as useful as people might think.  But I'm sure valid reasons for wanting to know this exist, so I'd be happy to create a patch which add a new sys.iswow64process() process if desired.

Cheers,

Mark


From jbarham at gmail.com  Fri May  2 08:50:52 2008
From: jbarham at gmail.com (John Barham)
Date: Thu, 1 May 2008 23:50:52 -0700
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com>
Message-ID: <4f34febc0805012350v6251650do28d46ff5f5577421@mail.gmail.com>

> I think it would be great if Python were the first real adopter of this
> convention...

A convention without any adopters?  Seems like a non sequitur...

From lists at cheimes.de  Fri May  2 10:30:20 2008
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 02 May 2008 10:30:20 +0200
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
Message-ID: <481AD11C.4020806@cheimes.de>

Guido van Rossum schrieb:
> I like this, except one issue: I really don't like the .local
> directory. I don't see any compelling reason why this needs to be
> ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide
> it from view, especially since the user is expected to manage this
> explicitly.

The directory name has been commented on by glyph in great length
(again). Thanks glyph! I'm all on his side. The base directory for
Python related files should be a dot directory in the root directory of
the users home dir. I slightly prefer ~/.local/ over other suggestions
but I'm also open to ~/.python.d/

Should I wait with the commit until we have agreed on a directory name
or do you want me to commit the code now?

> I might look at this later; but it seems to me to be a pure
> optimization and thus not required to be in before the first beta.

Correct, it's an optimization to enhance the memory utilization.

Christian

From steve at holdenweb.com  Fri May  2 10:49:17 2008
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 02 May 2008 04:49:17 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
Message-ID: <481AD58D.2010201@holdenweb.com>

Guido van Rossum wrote:
> I stand corrected on a few points. You've convinced me that ~/lib/ is
> wrong. But I still don't like ~/.local/; not in the last place because
> it's not any more local than any other dot files or directories. The
> "symmetry" with /usr/local/ is pretty weak, and certainly won't help
> beginning users.
> 
So it's the *name* you don't like rather than the invisibility?

> As a compromise, I'm okay with ~/Python/. I would like to be able to
> say that the user explicitly has to set an environment variable in
> order to benefit from this feature, just like with $PYTHONPATH and
> $PYTHONSTARTUP. But that might defeat the point of making this easy to
> use for noobs.
> 
Groan. Then everyone else realizes what a "great idea" this is, and we 
see ~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a 
directory with a comment market in its name), ~/Lisp/ and the rest? I 
don't think people would thank us for that in the long term.

I'm about +10 on invisibility, for the simple reason that "hiding the 
mechanism" is the right thing to do for naive users, who are the most 
likely to screw things up if given the chance and the most likely to be 
unaware of dot-name directories. If you don't like ~/.local/ then please 
consider ~/.private/ or ~/.personal/ or something else, but don't 
gratuitously add a visible subdirectory.

> On OS X I think we should put this somewhere under ~/Library/. Just
> put it in a different place than where the Python framework puts its
> stuff.
> 
Nothing to say about OS X.

One day Windows might start to respect the "hidden dot" convention, but 
perhaps in the interim we could create a (Windows-hidden) ~/.private/? 
Assuming we could work out where to put it ;-)

> On Thu, May 1, 2008 at 8:25 PM,  <glyph at divmod.com> wrote:
[much good sense]

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From lists at cheimes.de  Fri May  2 10:57:21 2008
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 02 May 2008 10:57:21 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <481AD58D.2010201@holdenweb.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<481AD58D.2010201@holdenweb.com>
Message-ID: <481AD771.6040802@cheimes.de>

Steve Holden schrieb:
> Nothing to say about OS X.
> 
> One day Windows might start to respect the "hidden dot" convention, but
> perhaps in the interim we could create a (Windows-hidden) ~/.private/?
> Assuming we could work out where to put it ;-)

Windows and Mac OS X have dedicated directories for application specific
libraries. That is ~/Library on Mac and Application Data on Windows. The
latter is i18n-ed and called "Anwendungsdaten" in German. Fortunately
Windows sets an environment var to the application data directory.

Christian

From lists at cheimes.de  Fri May  2 11:44:26 2008
From: lists at cheimes.de (Christian Heimes)
Date: Fri, 02 May 2008 11:44:26 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <20080502091633.GV78165@nexus.in-nomine.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<481AD58D.2010201@holdenweb.com> <481AD771.6040802@cheimes.de>
	<20080502091633.GV78165@nexus.in-nomine.org>
Message-ID: <481AE27A.10906@cheimes.de>

Jeroen Ruigrok van der Werven schrieb:
> "Windows uses the Roaming folder for application specific data, such as
> custom dictionaries, which are machine independent and should roam with the
> user profile. The AppData\Roaming folder in Windows Vista is the same as the
> Documents and Settings\username\Application Data folder in Windows XP."
> 
> I think that's different from what you meant above though, since I doubt
> you'd want this (the libraries) to roam with the user.

In a matter of fact I *want* to roam the libraries. On the other hand
this might become an issue if a user roams between a 32bit and 64bit
system ...


From ncoghlan at gmail.com  Fri May  2 12:43:19 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 02 May 2008 20:43:19 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <20080502092008.GW78165@nexus.in-nomine.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>	<481AD58D.2010201@holdenweb.com>
	<20080502092008.GW78165@nexus.in-nomine.org>
Message-ID: <481AF047.5050109@gmail.com>

Jeroen Ruigrok van der Werven wrote:
> -On [20080502 10:50], Steve Holden (steve at holdenweb.com) wrote:
>> Groan. Then everyone else realizes what a "great idea" this is, and we see 
>> ~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a directory with 
>> a comment market in its name), ~/Lisp/ and the rest? I don't think people 
>> would thank us for that in the long term.
> 
> I'm +1 on just using $HOME/.local, but otherwise $HOME/.python makes sense
> too. $HOME/.python.d doesn't do it for me, too clunky (and hardly used if I
> look at my .files in $HOME).
> 
> But I agree with Steve that it should be a hidden directory.

This sums up my opinion pretty well. Hidden by default, but easy to 
expose (e.g. via a local -> .local symlink) for the more experienced 
users that want it more easily accessible.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Fri May  2 12:51:20 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 02 May 2008 20:51:20 +1000
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <fvcvrd$j4a$1@ger.gmane.org>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<4805ECE1.6040501@gmail.com><20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com><20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com><87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de>
	<fvcvrd$j4a$1@ger.gmane.org>
Message-ID: <481AF228.2080900@gmail.com>

Terry Reedy wrote:
> I think standard Python should somehow have two options: escape everything 
> but ASCII (for unambuguity and old display systems) and escape nothing that 
> is potentially printable (leaving partially capable systems to fare as they 
> will).  In-between solutions will ultimately be programmer and system 
> specific.

If repr() is made to work as Martin suggests (i.e. only escape the 
unprintable stuff), then the unicode_escape codec can be used fairly 
easily to restore the 2.x escape everything non-ASCII behaviour.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From barry at python.org  Fri May  2 13:32:52 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 2 May 2008 07:32:52 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com>
Message-ID: <68FCCCBB-7DFF-4157-BE40-F816CBA7AA57@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 2, 2008, at 1:48 AM, glyph at divmod.com wrote:

> etc, though.  In the long term, if everyone followed suit on  
> ~/.local, that would be great.  But I don't want a ~/Python, ~/Java,  
> ~/Ruby, ~/PHP, ~/Perl, ~/OCaml and ~/Erlang and a $PATH as long as  
> my arm just so I can run a few applications without system- 
> installing them.

I hate to send a "me too" messages, but I have to say Glyph is exactly  
right here.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSBr75XEjvBPtnXfVAQIHAgP+JDpOymVEKfFvzZQZd8WtTpY6jsjvntAA
2J38LslMAXJSs3BcRBU/ELcbvTpr/JoEButktAQJCJpIhsmRTV0y3KcS/d/d+Sao
9V3ME2/yZ94qeQheB7jJIhfihNlC7VhG+CjSOMZrRZwm3k2drGGDdfdgGeSGZJOl
B6uCEB0i0iI=
=gup1
-----END PGP SIGNATURE-----

From exarkun at divmod.com  Fri May  2 15:32:49 2008
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Fri, 2 May 2008 09:32:49 -0400
Subject: [Python-3000] warnings.showwarning (was Re: [Python-Dev] Reminder:
 last alphas next Wednesday 07-May-2008)
In-Reply-To: <bbaeab100805011931i29e497e3vdb67b268644d9357@mail.gmail.com>
Message-ID: <20080502133249.6859.2057657874.divmod.quotient.58104@ohm>

On Thu, 1 May 2008 19:31:20 -0700, Brett Cannon <brett at python.org> wrote:
>
> [snip]
>
>I just closed the release blocker I created (the
>backwards-compatibility issue with warnings.showwarning() ). I would
>like to add a PendingDeprecationWarning (or stronger) to 2.6 for
>showwarning() implementations that don't support the optional 'line'
>argument. I guess the best way to do it in C code would be to see if
>PyFunction_GetDefaults() returns a tuple of length two (since
>showwarning() already has a single optional argument as it is).

Hi Brett,

I'm still seeing some strange behavior from the warnings module,  This
can be observed on the community buildbot for Twisted, for example:

http://python.org/dev/buildbot/community/trunk/x86%20Ubuntu%20Hardy%20trunk/builds/171/step-Twisted.zope.stable/0

The log ends with basically all of the warning-related tests in Twisted
failing, reporting that no warnings happened.

There is also some strange behavior that can be easily observed in the REPL:

    exarkun at boson:~/Projects/python/trunk$ ./python 
/home/exarkun/Projects/Divmod/trunk/Combinator/combinator/xsite.py:7: DeprecationWarning: the sets module is deprecated
      from sets import Set
    Python 2.6a2+ (trunk:62636M, May  2 2008, 09:19:41) 
    [GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import warnings
    >>> warnings.warn("foo")
    :1: UserWarning: foo       # Where'd the module name go?
    >>> def f(*a):
    ...     print a
    ... 
    >>> warnings.showwarning = f
    >>> warnings.warn("foo")
    >>>                        # Where'd the warning go?

Any ideas on this?

Jean-Paul

From exarkun at divmod.com  Fri May  2 15:47:16 2008
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Fri, 2 May 2008 09:47:16 -0400
Subject: [Python-3000] [Python-Dev] warnings.showwarning (was Re:
 Reminder: last alphas next Wednesday 07-May-2008)
In-Reply-To: <20080502133249.6859.2057657874.divmod.quotient.58104@ohm>
Message-ID: <20080502134716.6859.1256230877.divmod.quotient.58108@ohm>

On Fri, 2 May 2008 09:32:49 -0400, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>On Thu, 1 May 2008 19:31:20 -0700, Brett Cannon <brett at python.org> wrote:
>>
>>[snip]
>>
>>I just closed the release blocker I created (the
>>backwards-compatibility issue with warnings.showwarning() ). I would
>>like to add a PendingDeprecationWarning (or stronger) to 2.6 for
>>showwarning() implementations that don't support the optional 'line'
>>argument. I guess the best way to do it in C code would be to see if
>>PyFunction_GetDefaults() returns a tuple of length two (since
>>showwarning() already has a single optional argument as it is).
>
>Hi Brett,
>
>I'm still seeing some strange behavior from the warnings module,  This
>can be observed on the community buildbot for Twisted, for example:
>
>http://python.org/dev/buildbot/community/trunk/x86%20Ubuntu%20Hardy%20trunk/builds/171 
>/step-Twisted.zope.stable/0
>
>The log ends with basically all of the warning-related tests in Twisted
>failing, reporting that no warnings happened.

Just to follow up on this part, the failures are due to the tests expecting
to be able to override a different function in the warnings module, not
showwarning (warn_explicit).  We used warn_explicit because there's no way
to clear way to disable the filtering that gets applied to showwarning.
warn_explicit doesn't claim to be a public hook, so I guess I won't complain
about this. :)

The below behavior still seems wrong to me, though.

>There is also some strange behavior that can be easily observed in the REPL:
>
>    exarkun at boson:~/Projects/python/trunk$ ./python 
>/home/exarkun/Projects/Divmod/trunk/Combinator/combinator/xsite.py:7: 
>DeprecationWarning: the sets module is deprecated
>      from sets import Set
>    Python 2.6a2+ (trunk:62636M, May  2 2008, 09:19:41)    [GCC 4.1.3 
>20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
>    Type "help", "copyright", "credits" or "license" for more information.
>    >>> import warnings
>    >>> warnings.warn("foo")
>    :1: UserWarning: foo       # Where'd the module name go?
>    >>> def f(*a):
>    ...     print a
>    ...
>    >>> warnings.showwarning = f
>    >>> warnings.warn("foo")
>    >>>                        # Where'd the warning go?
>
>Any ideas on this?
>
>Jean-Paul

From guido at python.org  Fri May  2 15:56:33 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 2 May 2008 06:56:33 -0700
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <481AD11C.4020806@cheimes.de>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<481AD11C.4020806@cheimes.de>
Message-ID: <ca471dc20805020656n5288b34y9175a1df451088d8@mail.gmail.com>

I'm withdrawing my opposition in the light of the sheer number of
words that have already been written with this.

On Fri, May 2, 2008 at 1:30 AM, Christian Heimes <lists at cheimes.de> wrote:
> Guido van Rossum schrieb:
>
> > I like this, except one issue: I really don't like the .local
>  > directory. I don't see any compelling reason why this needs to be
>  > ~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide
>  > it from view, especially since the user is expected to manage this
>  > explicitly.
>
>  The directory name has been commented on by glyph in great length
>  (again). Thanks glyph! I'm all on his side. The base directory for
>  Python related files should be a dot directory in the root directory of
>  the users home dir. I slightly prefer ~/.local/ over other suggestions
>  but I'm also open to ~/.python.d/
>
>  Should I wait with the commit until we have agreed on a directory name
>  or do you want me to commit the code now?
>
>
>  > I might look at this later; but it seems to me to be a pure
>  > optimization and thus not required to be in before the first beta.
>
>  Correct, it's an optimization to enhance the memory utilization.
>
>  Christian
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ncoghlan at gmail.com  Fri May  2 15:59:46 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 02 May 2008 23:59:46 +1000
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
Message-ID: <481B1E52.908@gmail.com>

Barry Warsaw wrote:
 > Time is running short to get any new features into Python 2.6 and 3.0.
 > The release after this one is scheduled to be the first beta release, at
 > which time we will institute a feature freeze.  If your feature doesn't
 > make it in by then, you'll have to wait until 2.7/3.1.  If there is
 > something that absolutely must go into 2.6/3.0 be sure that there is a
 > bug issue open for it and that the Priority is set to 'release
 > blocker'.  I may reduce it to critical for the next alpha, but we'll
 > review all the release blocker and critical issues for the first 2.6 and
 > 3.0 beta releases.

I tried to bump http://bugs.python.org/issue643841 ("New class special 
method lookup change") up to release blocker, but the bug tracker still 
appears to be a bit flaky (it keeps giving me an error when I try to 
submit the change - unfortunately I can't submit anything about it to 
the metatracker, because I've forgotten my password for it and the 
metatracker is getting a connection refused when it tries to send the 
reminder email :P).

Here's the comment I was trying to submit along with the bug priority 
change:

"""Bumping the priority on this to release blocker for 3.0 - I think we 
need to have a good answer for the folks who've written old-style 
__getattr__ based auto-delegating classes before removing old-style 
classes entirely in 3.0.

We could get away with ignoring the issue in the past because people had 
the option of just using an old-style class rather than having to deal 
with the difficulties of doing this with a new-style class. With 3.0, 
that approach is being eliminated.

A ProxyMixin class written in Python would address that need (and 
shouldn't be particularly hard to write), but I'm not sure where it 
would go in the standard library."""

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From fdrake at acm.org  Fri May  2 19:53:54 2008
From: fdrake at acm.org (Fred Drake)
Date: Fri, 2 May 2008 13:53:54 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
Message-ID: <B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>

On May 1, 2008, at 7:54 PM, Barry Warsaw wrote:
> Interesting.  I'm of the opposite opinion.  I really don't want  
> Python dictating to me what my home directory should look like (a  
> dot file doesn't count because so many tools conspire to hide it  
> from me).  I guess there's always $PYTHONUSERBASE, but I think I  
> will not be alone. ;)


Using ~/.local/ for user-managed content doesn't seem right to me at  
all, because it's hidden by default.

If user-local package installs went to ~/ by default (~/bin/ for  
scripts, ~/lib/python/ or ~/lib/pythonX.Y/ for modules and packages),  
with a way to set an alternate "prefix" instead of ~/ using a  
distutils configuration setting, I'd be happy enough.

I'd be even happier if there were no default per-user location, but a  
required configuration setting (in the existing distutils config  
locations) in order to enable per-user installation.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From janssen at parc.com  Fri May  2 20:03:14 2008
From: janssen at parc.com (Bill Janssen)
Date: Fri, 2 May 2008 11:03:14 PDT
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <481AD11C.4020806@cheimes.de> 
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<481AD11C.4020806@cheimes.de>
Message-ID: <08May2.110318pdt."58696"@synergy1.parc.xerox.com>

> I slightly prefer ~/.local/ over other suggestions
> but I'm also open to ~/.python.d/

Guido's point about it not being necessarily "local" is a good one.  I
use lots of computers; they all automount my home directory (~) from a
network file server.  Nothing under that directory should be
machine-specific.  My .login and .xinitrc scripts check the machine ID
and do different things on different machines.

Bill

From janssen at parc.com  Fri May  2 20:10:27 2008
From: janssen at parc.com (Bill Janssen)
Date: Fri, 2 May 2008 11:10:27 PDT
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <481AD771.6040802@cheimes.de> 
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<481AD58D.2010201@holdenweb.com> <481AD771.6040802@cheimes.de>
Message-ID: <08May2.111033pdt."58696"@synergy1.parc.xerox.com>

> Windows and Mac OS X have dedicated directories for application specific
> libraries. That is ~/Library on Mac and Application Data on Windows.

In fact, I had to write code for this, and had to read the specs for each.
Here's the code (I've substituted Python for UpLib):

if sys.platform == 'darwin':
    listdir = os.path.expanduser(os.path.join("~", "Library", "Application Support", "org.python"))
elif sys.platform == 'win32':
    if os.environ.has_key('APPDATA'):
        listdir = os.path.join(os.environ['APPDATA'], 'Python')
    elif os.environ.has_key('USERPROFILE'):
        listdir = os.path.join(os.environ['USERPROFILE'], 'Application Data', 'Python')
    elif os.environ.has_key('HOMEDIR') and os.environ.has_key('HOMEPATH'):
        listdir = os.path.join(os.environ['HOMEDIR'], os.environ['HOMEPATH'], 'Python')
    else:
        listdir = os.path.join(os.path.expanduser("~"), 'Python')
else:
    # pretty much has to be unix
    listdir = os.path.expanduser(os.path.join("~", ".python"))


From guido at python.org  Fri May  2 21:56:35 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 2 May 2008 12:56:35 -0700
Subject: [Python-3000] Special offer! Ten code reviews
Message-ID: <ca471dc20805021256l3d4f906btb7635a662f4a21bd@mail.gmail.com>

I'd like to get some more people trying out codereview.appspot.com, so
I'm offering the first 10 people to submit a new patch there for my
review to do the review by Monday.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Fri May  2 22:14:52 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 03 May 2008 05:14:52 +0900
Subject: [Python-3000] ~/.local [was: Reminder: last alphas next Wednesday
	07-May-2008]
In-Reply-To: <08May2.110318pdt."58696"@synergy1.parc.xerox.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<481AD11C.4020806@cheimes.de>
	<08May2.110318pdt."58696"@synergy1.parc.xerox.com>
Message-ID: <87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp>

Bill Janssen replied to Christian Heimes as follows::

 > > I slightly prefer ~/.local/ over other suggestions
 > > but I'm also open to ~/.python.d/
 > 
 > Guido's point about it not being necessarily "local" is a good one.

Christian Heimes (I think) wrote:

> Windows and Mac OS X have dedicated directories for application specific
> libraries. That is ~/Library on Mac and Application Data on Windows.

You're both missing the point of what's wanted here, I suspect.  I
can't speak for others, but I do want "~/.local" and I agree with the
uses Glyph suggests for it.  I grant that "local" may not be a good
word for it in the context of a personal system in a corporate
environment, but here's how I think about it.

What it means (to me in the context of Unix-y system organization) is
"this is where I put stuff that I would be happy to have as part of
the system I was given (by some authority: my boss, Microsoft, or
Brett Cannon's stdlib PEP), but for some reason I'm not comfortable/
permitted to install it as system software."

It could physically reside on the moon (given a tachyon backbone
<wink>) and unlike Mac-ish ~/Library or "Application Data" on Windows
data *about me* or my use of the application *does not* go there.


From janssen at parc.com  Fri May  2 22:26:12 2008
From: janssen at parc.com (Bill Janssen)
Date: Fri, 2 May 2008 13:26:12 PDT
Subject: [Python-3000] ~/.local [was: Reminder: last alphas next
	Wednesday 07-May-2008]
In-Reply-To: <87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp> 
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<481AD11C.4020806@cheimes.de>
	<08May2.110318pdt."58696"@synergy1.parc.xerox.com>
	<87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <08May2.132618pdt."58696"@synergy1.parc.xerox.com>

> What it means (to me in the context of Unix-y system organization) is
> "this is where I put stuff that I would be happy to have as part of
> the system I was given (by some authority: my boss, Microsoft, or
> Brett Cannon's stdlib PEP), but for some reason I'm not comfortable/
> permitted to install it as system software."

Yeah, I was just pointing out that for me, "~" ports across a number
of different machines, and putting stuff specific to any particular
machine in there needs more thought.  For UpLib, I generate machine
UUIDs from characteristics of the machine, using uuidgen, and store
compiled code and other machine specific things in a subdirectory with
that UUID.  Otherwise, we end up trying to execute PPC compiled shared
libraries on a SPARC platform, or Python 2.5 extensions with Python 2.3.

Bill

From solipsis at pitrou.net  Fri May  2 22:32:33 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 2 May 2008 20:32:33 +0000 (UTC)
Subject: [Python-3000] Special offer! Ten code reviews
References: <ca471dc20805021256l3d4f906btb7635a662f4a21bd@mail.gmail.com>
Message-ID: <loom.20080502T203054-257@post.gmane.org>

Guido van Rossum <guido <at> python.org> writes:
> 
> I'd like to get some more people trying out codereview.appspot.com, so
> I'm offering the first 10 people to submit a new patch there for my
> review to do the review by Monday.

I just tried to submit a patch using the Web form, and got a 500 Server Error...


From musiccomposition at gmail.com  Fri May  2 23:09:05 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Fri, 2 May 2008 16:09:05 -0500
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
Message-ID: <1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com>

On Thu, May 1, 2008 at 11:41 AM, Guido van Rossum <guido at python.org> wrote:
> Some of you may have seen a video recorded in November 2006 where I
>  showed off Mondrian, a code review tool that I was developing for
>  Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped
>  that I could release Mondrian as open source, but it was not to be:
>  due to its popularity inside Google, it became more and more tied to
>  proprietary Google infrastructure like Bigtable, and it remained
>  limited to Perforce, the commercial revision control system most used
>  at Google.

I was salivating over that video, so I'm really excited be able to try
out something like it now.

>  Don't hesitate to drop me a note with feedback -- note though that
>  there are a few known issues listed at the end of the Help page. The
>  Help page is really a wiki, so feel free to improve it!

My request at the moment is to let people use their real names for
display; my email address does not at all resemble my name.


-- 
Cheers,
Benjamin Peterson

From musiccomposition at gmail.com  Fri May  2 23:13:47 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Fri, 2 May 2008 16:13:47 -0500
Subject: [Python-3000] Special offer! Ten code reviews
In-Reply-To: <loom.20080502T203054-257@post.gmane.org>
References: <ca471dc20805021256l3d4f906btb7635a662f4a21bd@mail.gmail.com>
	<loom.20080502T203054-257@post.gmane.org>
Message-ID: <1afaf6160805021413r3527734cna2a36f1dd6dd5204@mail.gmail.com>

On Fri, May 2, 2008 at 3:32 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>  I just tried to submit a patch using the Web form, and got a 500 Server Error...

It's been fixed.


-- 
Cheers,
Benjamin Peterson

From guido at python.org  Fri May  2 23:25:52 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 2 May 2008 14:25:52 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com>
Message-ID: <ca471dc20805021425s2fb2eefaq73ebb03e783b91e6@mail.gmail.com>

On Fri, May 2, 2008 at 2:09 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Thu, May 1, 2008 at 11:41 AM, Guido van Rossum <guido at python.org> wrote:
>  > Some of you may have seen a video recorded in November 2006 where I
>  >  showed off Mondrian, a code review tool that I was developing for
>  >  Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped
>  >  that I could release Mondrian as open source, but it was not to be:
>  >  due to its popularity inside Google, it became more and more tied to
>  >  proprietary Google infrastructure like Bigtable, and it remained
>  >  limited to Perforce, the commercial revision control system most used
>  >  at Google.
>
>  I was salivating over that video, so I'm really excited be able to try
>  out something like it now.
>
>
>  >  Don't hesitate to drop me a note with feedback -- note though that
>  >  there are a few known issues listed at the end of the Help page. The
>  >  Help page is really a wiki, so feel free to improve it!
>
>  My request at the moment is to let people use their real names for
>  display; my email address does not at all resemble my name.

I've noticed. Surely there's an interesting story there. :-)

The feature request is on my TODO list. The design is a bit involved,
since I'd have to ask people to register and maintain a userid ->
nickname mapping; the Google Account API we're piggybacking on only
gives you the email address. Once it's open sourced (Monday?) I'd love
to see contributions like this!

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From musiccomposition at gmail.com  Fri May  2 23:28:55 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Fri, 2 May 2008 16:28:55 -0500
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <ca471dc20805021425s2fb2eefaq73ebb03e783b91e6@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com>
	<ca471dc20805021425s2fb2eefaq73ebb03e783b91e6@mail.gmail.com>
Message-ID: <1afaf6160805021428s397cd8eer425fe9712e21ede2@mail.gmail.com>

On Fri, May 2, 2008 at 4:25 PM, Guido van Rossum <guido at python.org> wrote:
>  >  My request at the moment is to let people use their real names for
>  >  display; my email address does not at all resemble my name.
>
>  I've noticed. Surely there's an interesting story there. :-)

Maybe I tell you why next PyCon...

One more question: What's the number on the upper right hand corner by
my username?


-- 
Cheers,
Benjamin Peterson

From brett at python.org  Fri May  2 23:27:48 2008
From: brett at python.org (Brett Cannon)
Date: Fri, 2 May 2008 14:27:48 -0700
Subject: [Python-3000] [Python-Dev] warnings.showwarning (was Re:
	Reminder: last alphas next Wednesday 07-May-2008)
In-Reply-To: <20080502134716.6859.1256230877.divmod.quotient.58108@ohm>
References: <20080502133249.6859.2057657874.divmod.quotient.58104@ohm>
	<20080502134716.6859.1256230877.divmod.quotient.58108@ohm>
Message-ID: <bbaeab100805021427h2e052099yb2353e6d66ca55b3@mail.gmail.com>

On Fri, May 2, 2008 at 6:47 AM, Jean-Paul Calderone <exarkun at divmod.com> wrote:
[SNIP]
> > Hi Brett,
> >
> > I'm still seeing some strange behavior from the warnings module,  This
> > can be observed on the community buildbot for Twisted, for example:
> >
> >
> http://python.org/dev/buildbot/community/trunk/x86%20Ubuntu%20Hardy%20trunk/builds/171
> /step-Twisted.zope.stable/0
> >
> > The log ends with basically all of the warning-related tests in Twisted
> > failing, reporting that no warnings happened.
> >
>
>  Just to follow up on this part, the failures are due to the tests expecting
>  to be able to override a different function in the warnings module, not
>  showwarning (warn_explicit).  We used warn_explicit because there's no way
>  to clear way to disable the filtering that gets applied to showwarning.
>  warn_explicit doesn't claim to be a public hook, so I guess I won't
> complain
>  about this. :)
>

Yeah, you guys are being naughty by replacing that and expecting stuff
still to work. =)

>  The below behavior still seems wrong to me, though.
>
>
> > There is also some strange behavior that can be easily observed in the
> REPL:
> >
> >   exarkun at boson:~/Projects/python/trunk$ ./python
> /home/exarkun/Projects/Divmod/trunk/Combinator/combinator/xsite.py:7:
> DeprecationWarning: the sets module is deprecated
> >     from sets import Set
> >   Python 2.6a2+ (trunk:62636M, May  2 2008, 09:19:41)    [GCC 4.1.3
> 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
> >   Type "help", "copyright", "credits" or "license" for more information.
> >   >>> import warnings
> >   >>> warnings.warn("foo")
> >   :1: UserWarning: foo       # Where'd the module name go?
> >   >>> def f(*a):
> >   ...     print a
> >   ...
> >   >>> warnings.showwarning = f
> >   >>> warnings.warn("foo")
> >   >>>                        # Where'd the warning go?
> >
> > Any ideas on this?

If you run this in a stock 2.5 interpreter I get something similar
except the missing '__main__'. If I run it with PYTHONSTARTUP set it
actually uses that module for some reason as the source.

I created issue2743 to fix the output at the interpreter, but I made
it a critical bug since it is only at the interpreter (and thus
breaking people's code will be small), but it should still be fixed
since 'warnings' is a core piece of infrastructure.

-Brett

-Brett

From guido at python.org  Fri May  2 23:39:54 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 2 May 2008 14:39:54 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <1afaf6160805021428s397cd8eer425fe9712e21ede2@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<1afaf6160805021409l120fc02ag482af942eead0842@mail.gmail.com>
	<ca471dc20805021425s2fb2eefaq73ebb03e783b91e6@mail.gmail.com>
	<1afaf6160805021428s397cd8eer425fe9712e21ede2@mail.gmail.com>
Message-ID: <ca471dc20805021439g2d3b1270v28870e85db082c0@mail.gmail.com>

On Fri, May 2, 2008 at 2:28 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
>  One more question: What's the number on the upper right hand corner by
>  my username?

It's a debugging counter. It gets reset each time a new service
instance is created.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Sat May  3 00:33:15 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 2 May 2008 18:33:15 -0400
Subject: [Python-3000] Displaying strings containing unicode escapes
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<4805ECE1.6040501@gmail.com><20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com><20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com><87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de><fvcvrd$j4a$1@ger.gmane.org>
	<481AF228.2080900@gmail.com>
Message-ID: <fvg4rc$48q$1@ger.gmane.org>


"Nick Coghlan" <ncoghlan at gmail.com> wrote in message 
news:481AF228.2080900 at gmail.com...
| Terry Reedy wrote:
| > I think standard Python should somehow have two options: escape 
everything
| > but ASCII (for unambuguity and old display systems) and escape nothing 
that
| > is potentially printable (leaving partially capable systems to fare as 
they
| > will).  In-between solutions will ultimately be programmer and system
| > specific.
|
| If repr() is made to work as Martin suggests (i.e. only escape the
| unprintable stuff), then the unicode_escape codec can be used fairly
| easily to restore the 2.x escape everything non-ASCII behaviour.

so print(s.encode('unicode_escape)) ?
Fine with me, especially if that or whatever is added to the repr() doc. 


From stephen at xemacs.org  Sat May  3 01:42:49 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 03 May 2008 08:42:49 +0900
Subject: [Python-3000] ~/.local [was: Reminder: last alphas next
	Wednesday 07-May-2008]
In-Reply-To: <08May2.132618pdt."58696"@synergy1.parc.xerox.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<481AD11C.4020806@cheimes.de>
	<08May2.110318pdt."58696"@synergy1.parc.xerox.com>
	<87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp>
	<08May2.132618pdt."58696"@synergy1.parc.xerox.com>
Message-ID: <87iqxww9vq.fsf@uwakimon.sk.tsukuba.ac.jp>

Bill Janssen writes:

 > Yeah, I was just pointing out that for me, "~" ports across a number
 > of different machines, and putting stuff specific to any particular
 > machine in there needs more thought.

Sure.  But AIUI that's not the problem that "~/.local" is intended to
solve.  Also, it's a generic problem of networked environments, not in
any way limited to "~", which should be susceptible to the usual
solutions for multiarchitecture installations (eg subdirectories named
by GNU's CPU-OS-VENDOR convention, or your UUID convention).  In
particular, "pure Python" programs shouldn't much care, right?

From janssen at parc.com  Sat May  3 02:51:53 2008
From: janssen at parc.com (Bill Janssen)
Date: Fri, 2 May 2008 17:51:53 PDT
Subject: [Python-3000] ~/.local [was: Reminder: last alphas next
	Wednesday 07-May-2008]
In-Reply-To: <87iqxww9vq.fsf@uwakimon.sk.tsukuba.ac.jp> 
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<481AD11C.4020806@cheimes.de>
	<08May2.110318pdt."58696"@synergy1.parc.xerox.com>
	<87lk2swjib.fsf@uwakimon.sk.tsukuba.ac.jp>
	<08May2.132618pdt."58696"@synergy1.parc.xerox.com>
	<87iqxww9vq.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <08May2.175159pdt."58696"@synergy1.parc.xerox.com>

> In particular, "pure Python" programs shouldn't much care, right?

With the addition of ctypes, "pure" Python programs aren't so pure
anymore.  But even that should work across architectures, right?

> Also, it's a generic problem of networked environments, not in
> any way limited to "~", which should be susceptible to the usual
> solutions for multiarchitecture installations (eg subdirectories named
> by GNU's CPU-OS-VENDOR convention, or your UUID convention).

Yep.  I'm just pointing out that networked environments are becoming
more common, not less common.

Bill

From skip at pobox.com  Sat May  3 02:03:20 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 2 May 2008 19:03:20 -0500
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
Message-ID: <18459.43976.85481.758104@montanaro-dyndns-org.local>


    Fred> If user-local package installs went to ~/ by default ... with a
    Fred> way to set an alternate "prefix" instead of ~/ using a distutils
    Fred> configuration setting, I'd be happy enough.

+1 from me.

Skip

From ishimoto at gembook.org  Sat May  3 02:54:24 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Sat, 3 May 2008 09:54:24 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <fvg4rc$48q$1@ger.gmane.org>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>
	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>
	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de> <fvcvrd$j4a$1@ger.gmane.org>
	<481AF228.2080900@gmail.com> <fvg4rc$48q$1@ger.gmane.org>
Message-ID: <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>

On Sat, May 3, 2008 at 7:33 AM, Terry Reedy <tjreedy at udel.edu> wrote:
>  so print(s.encode('unicode_escape)) ?
>  Fine with me, especially if that or whatever is added to the repr() doc.
>

I don't recommend repr(obj).encode('unicode_escape'), because
backslash characters in the string will be escaped again by the codec.

>>> print(repr("\\"))
'\\'
>>> print(str(repr("\\").encode("unicode-escape"), "ASCII"))
'\\\\'

'ASCII' codec with 'backslashreplace' error handler works better.

>>> print(str(repr("\\").encode("ASCII", "backslashreplace"), "ASCII"))
'\\'

Looks complicated to get same result as Python 2.x. I originally
proposed to allow print(repr('\\'), encoding="ASCII",
errors="backslashreplace") to get same result, but this is hard to
implement.

If requirement for ASCII-repr is popular enough, we can provide a
built-in function like this:

def repr_ascii(obj):
    return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")

2to3 can use repr_ascii() for better compatibility.

Is new built-in function desirable, or just document is good enough?

From tjreedy at udel.edu  Sat May  3 02:02:56 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 2 May 2008 20:02:56 -0400
Subject: [Python-3000] PEP 8 Style Guide and Python 3
Message-ID: <fvga3g$ga1$1@ger.gmane.org>

At least one of the style recommendations in PEP 8 -- use class rather that 
string exceptions -- is obsolete in Py 3.  And there are others, and 
perhaps others where the spirit of the recommendation is the same but 
details are different.

For a new Python 3 programmer who does not need or want to know anything 
about Python 2, reading about 'string exceptions' would be confusing.

One possibility for isolation is for each major section to have separate 
2.x and 3.x subsections.  But where there are several scattered changes, 
this would require large chunks of duplication.  For instance, under

Prescriptive: Naming Conventions
     Package and Module Names
          Modules should have...

becomes Modules must have ... (I presume, hence the renaming project).
But all three paragraphs would have to be duplicated in 2.x and 3.x to be 
coherent, and then they would not be in their sensible place.

A couple of paragraphs on, 'because exceptions should be classes' becomes 
'because exceptions are classes'.  Again, moving two variants to 2.x and 
3.x sections would be awkward.

So, especially if PEP 8 is considered more or less frozen, I suggest the 
possibility of a new PEP 3008, Python 3 style guide.

Terry Jan Reedy


From martin at v.loewis.de  Sat May  3 09:34:55 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 03 May 2008 09:34:55 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de>
	<fvcvrd$j4a$1@ger.gmane.org>	<481AF228.2080900@gmail.com>
	<fvg4rc$48q$1@ger.gmane.org>
	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
Message-ID: <481C159F.9080409@v.loewis.de>

> Is new built-in function desirable, or just document is good enough?

Traditionally, I take the position that new built-in functions are
rarely desirable; this one is no exception.

Regards,
Martin

From ncoghlan at gmail.com  Sat May  3 10:48:52 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 03 May 2008 18:48:52 +1000
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <481C159F.9080409@v.loewis.de>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de>	<fvcvrd$j4a$1@ger.gmane.org>	<481AF228.2080900@gmail.com>	<fvg4rc$48q$1@ger.gmane.org>	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
	<481C159F.9080409@v.loewis.de>
Message-ID: <481C26F4.1030700@gmail.com>

Martin v. L?wis wrote:
>> Is new built-in function desirable, or just document is good enough?
> 
> Traditionally, I take the position that new built-in functions are
> rarely desirable; this one is no exception.

I agree with that, but string.repr_ascii may be a reasonable thing to add.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sat May  3 11:05:43 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 03 May 2008 19:05:43 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <18459.43976.85481.758104@montanaro-dyndns-org.local>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
Message-ID: <481C2AE7.9010805@gmail.com>

skip at pobox.com wrote:
>     Fred> If user-local package installs went to ~/ by default ... with a
>     Fred> way to set an alternate "prefix" instead of ~/ using a distutils
>     Fred> configuration setting, I'd be happy enough.
> 
> +1 from me.

But then we clutter up people's (read *my*) home directory with no way 
for them to do anything about it. We should stay out of people's way by 
default, while making it easy for them to poke around if they want to. 
The ~/.local convention does that, but using ~/ directly does not.

The major reasons why I think staying out of people's way by default is 
important:
- for people like me (glyph, Georg, etc), it allows us to keep our home 
directory organised the way we like it. As far as I am concered, 
applications can store whatever user-specific configuration and data 
files they like inside hidden files or directories, but they shouldn't 
be inflicting any visible files on me that aren't related to things I am 
working on.
- for novice users, the fact that it's hidden helps keep them from 
deleting it by accident
- for experienced users (Barry, skip, etc) that want ~/.local to be more 
easily accessible, creating a visible ~/local symlink is an utterly 
trivial exercise.

Switching the default to use public directories instead of hidden ones 
helps the third group at the expense of the first two groups. Given that 
the third group already has an easy workaround to get the behaviour they 
want, that seems like a bad trade-off to me.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sat May  3 11:08:13 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 03 May 2008 19:08:13 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <08May2.110318pdt."58696"@synergy1.parc.xerox.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<481AD11C.4020806@cheimes.de>
	<08May2.110318pdt."58696"@synergy1.parc.xerox.com>
Message-ID: <481C2B7D.7000403@gmail.com>

Bill Janssen wrote:
>> I slightly prefer ~/.local/ over other suggestions
>> but I'm also open to ~/.python.d/
> 
> Guido's point about it not being necessarily "local" is a good one.  I
> use lots of computers; they all automount my home directory (~) from a
> network file server.  Nothing under that directory should be
> machine-specific.  My .login and .xinitrc scripts check the machine ID
> and do different things on different machines.

So long as the machine-specific stuff gets installed to architecture 
specific directories as they do under /usr/local, I don't see why this 
would be a problem.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Sat May  3 18:07:03 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 3 May 2008 09:07:03 -0700
Subject: [Python-3000] PEP 8 Style Guide and Python 3
In-Reply-To: <fvga3g$ga1$1@ger.gmane.org>
References: <fvga3g$ga1$1@ger.gmane.org>
Message-ID: <ca471dc20805030907y2561e334x62e7d79295d62318@mail.gmail.com>

I'd much rather stick with a single style guide; PEP 8 can be revised
as needed. I suggest that we preface the 2.x-specific things with
words like "in Python 2, ..." but by and large focus the style guide
on Py3k. We could even migrate the rules that are only relevant to 2.x
to an Appendix-like chapter. That can then be easily deleted at some
point in the future.

On Fri, May 2, 2008 at 5:02 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> At least one of the style recommendations in PEP 8 -- use class rather that
>  string exceptions -- is obsolete in Py 3.  And there are others, and
>  perhaps others where the spirit of the recommendation is the same but
>  details are different.
>
>  For a new Python 3 programmer who does not need or want to know anything
>  about Python 2, reading about 'string exceptions' would be confusing.
>
>  One possibility for isolation is for each major section to have separate
>  2.x and 3.x subsections.  But where there are several scattered changes,
>  this would require large chunks of duplication.  For instance, under
>
>  Prescriptive: Naming Conventions
>      Package and Module Names
>           Modules should have...
>
>  becomes Modules must have ... (I presume, hence the renaming project).
>  But all three paragraphs would have to be duplicated in 2.x and 3.x to be
>  coherent, and then they would not be in their sensible place.
>
>  A couple of paragraphs on, 'because exceptions should be classes' becomes
>  'because exceptions are classes'.  Again, moving two variants to 2.x and
>  3.x sections would be awkward.
>
>  So, especially if PEP 8 is considered more or less frozen, I suggest the
>  possibility of a new PEP 3008, Python 3 style guide.
>
>  Terry Jan Reedy
>
>
>
>  _______________________________________________
>  Python-3000 mailing list
>  Python-3000 at python.org
>  http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Sat May  3 13:51:40 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 3 May 2008 06:51:40 -0500
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <481C2AE7.9010805@gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
Message-ID: <18460.20940.882777.235301@montanaro-dyndns-org.local>


    Nick> skip at pobox.com wrote:
    Fred> If user-local package installs went to ~/ by default ... with a
    Fred> way to set an alternate "prefix" instead of ~/ using a distutils
    Fred> configuration setting, I'd be happy enough.

    Skip> +1 from me.

    Nick> But then we clutter up people's (read *my*) home directory with no
    Nick> way for them to do anything about it. 

Fred asked for a --prefix flag (which is what I was voting on).  I don't
really care what you do by default as long as you give me a way to do it
differently.

Skip

From skip at pobox.com  Sat May  3 17:08:32 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sat, 3 May 2008 10:08:32 -0500
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <BF246FDC-6968-4523-8FF2-FB4B98377BF7@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
	<BF246FDC-6968-4523-8FF2-FB4B98377BF7@python.org>
Message-ID: <18460.32752.462983.25145@montanaro-dyndns-org.local>


    >> - for experienced users (Barry, skip, etc) that want ~/.local to be
    >>   more easily accessible, creating a visible ~/local symlink is an
    >>   utterly trivial exercise.

    Barry> Hey Nick, I agree with everything above, except that I'd probably
    Barry> put myself more in Glyph's camp :).  Can't speak for Skip
    Barry> though...

I already install everything in ~/local and just have ~/local/bin in my
PATH.  If I lived in a truly platform-dependent world I'd add
platform-dependent ~/local-plat1, ~/local/plat2, etc directories and extend
PATH a bit more.

Skip

From stephen at xemacs.org  Sat May  3 13:02:14 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 03 May 2008 20:02:14 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <481C26F4.1030700@gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>
	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>
	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de> <fvcvrd$j4a$1@ger.gmane.org>
	<481AF228.2080900@gmail.com> <fvg4rc$48q$1@ger.gmane.org>
	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
	<481C159F.9080409@v.loewis.de> <481C26F4.1030700@gmail.com>
Message-ID: <87ej8jwszt.fsf@uwakimon.sk.tsukuba.ac.jp>

Nick Coghlan writes:
 > Martin v. L?wis wrote:
 > >> Is new built-in function desirable, or just document is good enough?
 > > 
 > > Traditionally, I take the position that new built-in functions are
 > > rarely desirable; this one is no exception.
 > 
 > I agree with that, but string.repr_ascii may be a reasonable thing to add.

But this is basically completely a codec issue.  We have an internal
representation, and we want to translate it in a stream-oriented way
to an external representation.  Unless there's an efficiency issue,
why not just provide a hook for a codec?


From barry at python.org  Sat May  3 15:10:30 2008
From: barry at python.org (Barry Warsaw)
Date: Sat, 3 May 2008 09:10:30 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <481C2AE7.9010805@gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
Message-ID: <BF246FDC-6968-4523-8FF2-FB4B98377BF7@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 3, 2008, at 5:05 AM, Nick Coghlan wrote:
>
> The major reasons why I think staying out of people's way by default  
> is important:
> - for people like me (glyph, Georg, etc), it allows us to keep our  
> home directory organised the way we like it. As far as I am  
> concered, applications can store whatever user-specific  
> configuration and data files they like inside hidden files or  
> directories, but they shouldn't be inflicting any visible files on  
> me that aren't related to things I am working on.
> - for novice users, the fact that it's hidden helps keep them from  
> deleting it by accident
> - for experienced users (Barry, skip, etc) that want ~/.local to be  
> more easily accessible, creating a visible ~/local symlink is an  
> utterly trivial exercise.

Hey Nick, I agree with everything above, except that I'd probably put  
myself more in Glyph's camp :).  Can't speak for Skip though...

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSBxkSXEjvBPtnXfVAQKSigP/d6HIeQ5QLZR4QZ7GAIttb0d+8JI6PM0e
3E2+br0jZ9IeDwjjCLIAx1kbfgIX56++NGoU7tQqiQtbcapI3H3Vb+X+VSAcs30L
ORj709MDtF2oqXSzEHww5HHeKoZiQ8/FfiaZoXrXzqPVP5k9MSZu1zLrT3rpWAUP
8YLFekz/LUA=
=l5be
-----END PGP SIGNATURE-----

From ncoghlan at gmail.com  Sat May  3 20:00:05 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 04 May 2008 04:00:05 +1000
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <87ej8jwszt.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de>	<fvcvrd$j4a$1@ger.gmane.org>	<481AF228.2080900@gmail.com>	<fvg4rc$48q$1@ger.gmane.org>	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>	<481C159F.9080409@v.loewis.de>	<481C26F4.1030700@gmail.com>
	<87ej8jwszt.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <481CA825.8010606@gmail.com>

Stephen J. Turnbull wrote:
> Nick Coghlan writes:
>  > Martin v. L?wis wrote:
>  > >> Is new built-in function desirable, or just document is good enough?
>  > > 
>  > > Traditionally, I take the position that new built-in functions are
>  > > rarely desirable; this one is no exception.
>  > 
>  > I agree with that, but string.repr_ascii may be a reasonable thing to add.
> 
> But this is basically completely a codec issue.  We have an internal
> representation, and we want to translate it in a stream-oriented way
> to an external representation.  Unless there's an efficiency issue,
> why not just provide a hook for a codec?

It would just be a convenience function to do a string to string 
conversion in code. I agree for an actual output stream you could just 
set the encoding to ASCII with backslashreplace error handling.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Sat May  3 20:13:08 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 04 May 2008 04:13:08 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <BF246FDC-6968-4523-8FF2-FB4B98377BF7@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
	<BF246FDC-6968-4523-8FF2-FB4B98377BF7@python.org>
Message-ID: <481CAB34.6040809@gmail.com>

Barry Warsaw wrote:
> On May 3, 2008, at 5:05 AM, Nick Coghlan wrote:
>> - for experienced users (Barry, skip, etc) that want ~/.local to be 
>> more easily accessible, creating a visible ~/local symlink is an 
>> utterly trivial exercise.
> 
> Hey Nick, I agree with everything above, except that I'd probably put 
> myself more in Glyph's camp :).  Can't speak for Skip though...

I was actually looking at something Fred wrote and managed to misread it 
as something you had posted - and it turns out Skip was just agreeing 
with Fred about the 'provide an option to tell distutils to use a 
different user-specific directory name than the default one' idea, and 
isn't particularly worried about where the packages go by default.

Sorry for the confusion.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From phd at phd.pp.ru  Sat May  3 22:02:14 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sun, 4 May 2008 00:02:14 +0400
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
References: <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>
	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>
	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de> <fvcvrd$j4a$1@ger.gmane.org>
	<481AF228.2080900@gmail.com> <fvg4rc$48q$1@ger.gmane.org>
	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
Message-ID: <20080503200214.GA32314@phd.pp.ru>

On Sat, May 03, 2008 at 09:54:24AM +0900, Atsuo Ishimoto wrote:
> If requirement for ASCII-repr is popular enough, we can provide a
> built-in function like this:
> 
> def repr_ascii(obj):
>     return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")

   It is hard to apply the function for repr(container).
repr(container).encode("unicode_escape") is the only way (at least I don't
see any other way).

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From martin at v.loewis.de  Sat May  3 22:20:43 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 03 May 2008 22:20:43 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <20080503200214.GA32314@phd.pp.ru>
References: <87wsmxpc0r.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730804290520g6982518fkf25ac81bf7ea9260@mail.gmail.com>	<87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de>
	<fvcvrd$j4a$1@ger.gmane.org>	<481AF228.2080900@gmail.com>
	<fvg4rc$48q$1@ger.gmane.org>	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
	<20080503200214.GA32314@phd.pp.ru>
Message-ID: <481CC91B.307@v.loewis.de>

> On Sat, May 03, 2008 at 09:54:24AM +0900, Atsuo Ishimoto wrote:
>> If requirement for ASCII-repr is popular enough, we can provide a
>> built-in function like this:
>>
>> def repr_ascii(obj):
>>     return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
> 
>    It is hard to apply the function for repr(container).
> repr(container).encode("unicode_escape") is the only way (at least I don't
> see any other way).

I think Atsuo envisioned you to invoke "repr_ascii(container)".

Regards,
Martin


From phd at phd.pp.ru  Sat May  3 22:36:17 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sun, 4 May 2008 00:36:17 +0400
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <481CC91B.307@v.loewis.de>
References: <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>
	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de> <fvcvrd$j4a$1@ger.gmane.org>
	<481AF228.2080900@gmail.com> <fvg4rc$48q$1@ger.gmane.org>
	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
	<20080503200214.GA32314@phd.pp.ru> <481CC91B.307@v.loewis.de>
Message-ID: <20080503203617.GA1658@phd.pp.ru>

On Sat, May 03, 2008 at 10:20:43PM +0200, "Martin v. L?wis" wrote:
> > On Sat, May 03, 2008 at 09:54:24AM +0900, Atsuo Ishimoto wrote:
> >> If requirement for ASCII-repr is popular enough, we can provide a
> >> built-in function like this:
> >>
> >> def repr_ascii(obj):
> >>     return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
> > 
> >    It is hard to apply the function for repr(container).
> > repr(container).encode("unicode_escape") is the only way (at least I don't
> > see any other way).
> 
> I think Atsuo envisioned you to invoke "repr_ascii(container)".

   Who knows what are string representations of the objects in container;
there is a chance .encode() after repr() will escape or unescape the result
in a wrong way. I do not insist on anything (I think printable repr() and
repr().encode("unicode_escape") satisfy my needs) so I'm just pointing
there could be a problem; don't know how important it is.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From martin at v.loewis.de  Sat May  3 22:57:06 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 03 May 2008 22:57:06 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <20080503203617.GA1658@phd.pp.ru>
References: <87hcdkzgj5.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560804291200j5a924a24o571dce15634172cc@mail.gmail.com>	<87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>	<4819F21D.8070808@v.loewis.de>
	<fvcvrd$j4a$1@ger.gmane.org>	<481AF228.2080900@gmail.com>
	<fvg4rc$48q$1@ger.gmane.org>	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>	<20080503200214.GA32314@phd.pp.ru>
	<481CC91B.307@v.loewis.de> <20080503203617.GA1658@phd.pp.ru>
Message-ID: <481CD1A2.2030105@v.loewis.de>

>>>> def repr_ascii(obj):
>>>>     return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
>>>    It is hard to apply the function for repr(container).
>>> repr(container).encode("unicode_escape") is the only way (at least I don't
>>> see any other way).
>> I think Atsuo envisioned you to invoke "repr_ascii(container)".
> 
>    Who knows what are string representations of the objects in container;

I know: it's a Unicode object.

> there is a chance .encode() after repr() will escape or unescape the result
> in a wrong way.

No, there is no such chance.

> I do not insist on anything (I think printable repr() and
> repr().encode("unicode_escape") satisfy my needs) so I'm just pointing
> there could be a problem; don't know how important it is.

I don't think there is one (except that any repr_ascii function should
also *decode* its result back into a string before returning it).

Regards,
Martin

From phd at phd.pp.ru  Sat May  3 23:09:00 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sun, 4 May 2008 01:09:00 +0400
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <481CD1A2.2030105@v.loewis.de>
References: <87d4o72961.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4819F21D.8070808@v.loewis.de> <fvcvrd$j4a$1@ger.gmane.org>
	<481AF228.2080900@gmail.com> <fvg4rc$48q$1@ger.gmane.org>
	<797440730805021754n5626950fm7582e19f605f9a8a@mail.gmail.com>
	<20080503200214.GA32314@phd.pp.ru> <481CC91B.307@v.loewis.de>
	<20080503203617.GA1658@phd.pp.ru> <481CD1A2.2030105@v.loewis.de>
Message-ID: <20080503210900.GB1658@phd.pp.ru>

On Sat, May 03, 2008 at 10:57:06PM +0200, "Martin v. L?wis" wrote:
> > there is a chance .encode() after repr() will escape or unescape the result
> > in a wrong way.
> 
> No, there is no such chance.

   Ok, then. Probbaly I was wrong.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From fdrake at acm.org  Sun May  4 01:34:03 2008
From: fdrake at acm.org (Fred Drake)
Date: Sat, 3 May 2008 19:34:03 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <18460.20940.882777.235301@montanaro-dyndns-org.local>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
	<18460.20940.882777.235301@montanaro-dyndns-org.local>
Message-ID: <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>

On May 3, 2008, at 7:51 AM, skip at pobox.com wrote:
> Fred asked for a --prefix flag (which is what I was voting on).  I  
> don't
> really care what you do by default as long as you give me a way to  
> do it
> differently.

What's most interesting (to me) is that no one's commented on my note  
that my preferred approach would be that there's no default at all;  
the location would have to be specified explicitly.  Whether on the  
command line or in the distutils configuration doesn't matter, but  
explicitness should be required.


   -Fred

-- 
Fred Drake   <fdrake at acm.org>


From ncoghlan at gmail.com  Sun May  4 06:50:45 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 04 May 2008 14:50:45 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>
	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
Message-ID: <481D40A5.1050905@gmail.com>

Fred Drake wrote:
> On May 3, 2008, at 7:51 AM, skip at pobox.com wrote:
>> Fred asked for a --prefix flag (which is what I was voting on).  I don't
>> really care what you do by default as long as you give me a way to do it
>> differently.
> 
> What's most interesting (to me) is that no one's commented on my note 
> that my preferred approach would be that there's no default at all; the 
> location would have to be specified explicitly.  Whether on the command 
> line or in the distutils configuration doesn't matter, but explicitness 
> should be required.

I thought Christian said something about that defeating one of the main 
points of the PEP - to allow per-user installation of modules to "just 
work" for non-administrators. (It may not have been Christian, and it 
may not have been directly in response to you, but I'm pretty sure I 
read it somewhere in this thread ;)

Anyway, a per-user site-packages directly only "just works" if the 
standard behaviour of a Python installation is to provide access to the 
per-user packages without requiring any additional action on the part of 
the user.

A couple of paragraphs in the PEP may also be of interest to you:

"""For security reasons the user site directory is not added to sys.path 
when the effective user id or group id is not equal to the process uid / 
gid [9]. It's an additional barrier against code injection into suid 
apps. However Python suid scripts must always use the -E and -s option 
or users can sneak in their own code.

The user site directory can be suppressed with a new option -s or the 
environment variable PYTHONNOUSERSITE. The feature can be disabled 
globally by setting site.ENABLE_USER_SITE to the value False. It must be 
set by editing site.py. It can't be altered in sitecustomize.py or later."""

So Python itself turns the feature off automatically for invocation via 
sudo and the like, and the sysadmin can disable the feature completely 
through site.py.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From aahz at pythoncraft.com  Sun May  4 01:25:51 2008
From: aahz at pythoncraft.com (Aahz)
Date: Sat, 3 May 2008 16:25:51 -0700
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <68FCCCBB-7DFF-4157-BE40-F816CBA7AA57@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com>
	<68FCCCBB-7DFF-4157-BE40-F816CBA7AA57@python.org>
Message-ID: <20080503232550.GB28577@panix.com>

On Fri, May 02, 2008, Barry Warsaw wrote:
> On May 2, 2008, at 1:48 AM, glyph at divmod.com wrote:
>> 
>>In the long term, if everyone followed suit on  
>>~/.local, that would be great.  But I don't want a ~/Python, ~/Java,  
>>~/Ruby, ~/PHP, ~/Perl, ~/OCaml and ~/Erlang and a $PATH as long as  
>>my arm just so I can run a few applications without system- 
>>installing them.
> 
> I hate to send a "me too" messages, but I have to say Glyph is exactly  
> right here.

+1
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Help a hearing-impaired person: http://rule6.info/hearing.html

From stefan_ml at behnel.de  Sun May  4 14:52:42 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 04 May 2008 14:52:42 +0200
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <ca471dc20805011645s239ec048l8d865703f065ef5d@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>	<fvdk8j$iuh$2@ger.gmane.org>
	<ca471dc20805011645s239ec048l8d865703f065ef5d@mail.gmail.com>
Message-ID: <481DB19A.20605@behnel.de>

Guido van Rossum wrote:
> On Thu, May 1, 2008 at 4:37 PM, Neal Becker <ndbecker2 at gmail.com> wrote:
>> It would be really nice to see support for some other backends, such as Hg
>>  or bzr (which are both written in python), in addition to svn.
> 
> Once it's open source feel free to add those!

trac supports a pretty wide set of VCSes.

http://trac.edgewall.org/wiki/VersioningSystemBackend

Maybe your tools could integrate these backends somehow instead of
re-implementing yet another suite of VCS backend connectors.

Stefan


From skip at pobox.com  Sun May  4 16:14:59 2008
From: skip at pobox.com (skip at pobox.com)
Date: Sun, 4 May 2008 09:14:59 -0500
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
	<18460.20940.882777.235301@montanaro-dyndns-org.local>
	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
Message-ID: <18461.50403.928218.622685@montanaro-dyndns-org.local>


    glyph> As I've said a dozen times in this thread already, the feature
    glyph> I'd like to get from a per-user installation location is that
    glyph> 'setup.py install', or at least some completely canonical
    glyph> distutils incantation, should work, by default, for non-root
    glyph> users; ideally non-administrators on windows as well as non-root
    glyph> users on unixish platforms.

I'm unclear why anything needs changing then.  At work we have idiosyncratic
central install locations for everything, not just Python.  None of this
stuff is installed by root.  When I want to install some package to test
without polluting the central waters I simply run setup.py install with a
--prefix arg then set PYTHONPATH to pick up my stuff before the central
stuff.  I see no reason to change the behavior of setup.py's install
command.  It gives you the flexibility needed to handle a number of
different scenarios.

Skip

From jnoller at gmail.com  Sun May  4 16:17:36 2008
From: jnoller at gmail.com (Jesse Noller)
Date: Sun, 4 May 2008 10:17:36 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
	<18460.20940.882777.235301@montanaro-dyndns-org.local>
	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
Message-ID: <4222a8490805040717u5de66d0cw7eadf471f19fe7b8@mail.gmail.com>

On Sun, May 4, 2008 at 9:58 AM,  <glyph at divmod.com> wrote:
...snip...
>  As I've said a dozen times in this thread already, the feature I'd like to
> get from a per-user installation location is that 'setup.py install', or at
> least some completely canonical distutils incantation, should work, by
> default, for non-root users; ideally non-administrators on windows as well
> as non-root users on unixish platforms.
>

This is a big +1 from me. The way I currently work around the "must be
root to install stuff" on both OS/X and other Lin/Uni(xes) is via
virtualenv.py and a lot of bash environment trickery.  If nothing else
comes out of this, I think what glyph points out is the ideal, and
simplest goal. Ignoring the directory name debate, I would like to see
this local "user" dir mirror the normal directory tree that packages
installed from distutils/setuptools typically use, namely it should
have the: lib/site-packages/<your module here> and bin/<your scripts
here> directories, and a known parent name.

One thing that could be done is pick a default name for the parent,
ala ~/Python - but let users override it with an environment variable
if they so desire (PYTHON_USER_DIR?) so that those who want it hidden
can have it hidden, and those of us who don't, don't.

-jesse

From ncoghlan at gmail.com  Sun May  4 16:22:27 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 05 May 2008 00:22:27 +1000
Subject: [Python-3000] PEP 370 (was Re: [Python-Dev] Reminder: last alphas
 next	Wednesday	07-May-2008)
In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
Message-ID: <481DC6A3.70104@gmail.com>

glyph at divmod.com wrote:
> As I've said a dozen times in this thread already, the feature I'd like 
> to get from a per-user installation location is that 'setup.py install', 
> or at least some completely canonical distutils incantation, should 
> work, by default, for non-root users; ideally non-administrators on 
> windows as well as non-root users on unixish platforms.

This is what I see as the goal of PEP 370 as well. Perhaps the PEP could 
be more explicit in spelling that out?

"""The primary goal of this PEP is to provide a standard mechanism 
allowing Python users to install distutils packages for their own use 
without affecting other users of the same machine, and without requiring 
any change to the packages themselves."""

I think the current Rationale section kind of assumes that the reader 
already recognises the above paragraph as the reason for the PEP.

In the UNIX Notes section, the PEP should probably also state that the 
reason for choosing a hidden dot-file directory is that users generally 
aren't going to have any interest in the source files for the Python 
packages that they install, and that users that would prefer for the 
files to be visible can easily make a symbolic link to the directory.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From facundobatista at gmail.com  Sun May  4 16:49:54 2008
From: facundobatista at gmail.com (Facundo Batista)
Date: Sun, 4 May 2008 11:49:54 -0300
Subject: [Python-3000] PEP 8 Style Guide and Python 3
In-Reply-To: <ca471dc20805030907y2561e334x62e7d79295d62318@mail.gmail.com>
References: <fvga3g$ga1$1@ger.gmane.org>
	<ca471dc20805030907y2561e334x62e7d79295d62318@mail.gmail.com>
Message-ID: <e04bdf310805040749v6b540ad0qc6008399a053f20@mail.gmail.com>

2008/5/3, Guido van Rossum <guido at python.org>:

>  as needed. I suggest that we preface the 2.x-specific things with
>  words like "in Python 2, ..." but by and large focus the style guide
>  on Py3k. We could even migrate the rules that are only relevant to 2.x
>  to an Appendix-like chapter. That can then be easily deleted at some
>  point in the future.

+1

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From lists at cheimes.de  Sun May  4 18:14:06 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 04 May 2008 18:14:06 +0200
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
Message-ID: <481DE0CE.8010306@cheimes.de>

> First, Skip, I *only* care about the default behavior.  There's already
> a way to do it differently: PYTHONPATH.  So, Fred, I think what you're
> arguing for is to drop this feature entirely.  Or is there some other
> use for a new way to allow users to explicitly add something to
> sys.path, aside from PYTHONPATH?  It seems that it would add more
> complexity and I can't see what the value would be.

PYTHONPATH is lacking one feature which is important for lots of
packages and setuptools. The directories in PYTHONPATH are just added to
sys.path. But setuptools require a site package directory. Maybe a new
env var PYTHONSITEPATH could solve the problem.

> As I've said a dozen times in this thread already, the feature I'd like
> to get from a per-user installation location is that 'setup.py install',
> or at least some completely canonical distutils incantation, should
> work, by default, for non-root users; ideally non-administrators on
> windows as well as non-root users on unixish platforms.

The implementation of my PEP provides a new option for install:

$ python setup.py install --user

Is it sufficient for you?

Christian

From lists at cheimes.de  Sun May  4 18:19:17 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 04 May 2008 18:19:17 +0200
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <481C2AE7.9010805@gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
Message-ID: <481DE205.1060108@cheimes.de>

Nick Coghlan schrieb:
> - for experienced users (Barry, skip, etc) that want ~/.local to be more
> easily accessible, creating a visible ~/local symlink is an utterly
> trivial exercise.

Our you can set the environment variable PYTHONUSERBASE to $HOME.
PYTHONUSERBASE is the root directory for user specific data:

def addusersitepackages(known_paths):
    """Add a per user site-package to sys.path

    Each user has its own python directory with site-packages in the
    home directory.

    USER_BASE is the root directory for all Python versions

    USER_SITE is the user specific site-packages directory

    USER_SITE/.. can be used for data.
    """
    global USER_BASE, USER_SITE
    env_base = os.environ.get("PYTHONUSERBASE", None)

    def joinuser(*args):
        return os.path.expanduser(os.path.join(*args))

    #if sys.platform in ('os2emx', 'riscos'):
    #    # Don't know what to put here
    #    USER_BASE = ''
    #    USER_SITE = ''
    if os.name == "nt":
        base = os.environ.get("APPDATA") or "~"
        USER_BASE = env_base if env_base else joinuser(base, "Python")
        USER_SITE = os.path.join(USER_BASE,
                                 "Python" + sys.version[0] + sys.version[2],
                                 "site-packages")
    else:
        USER_BASE = env_base if env_base else joinuser("~", ".local")
        USER_SITE = os.path.join(USER_BASE, "lib",
                                 "python" + sys.version[:3],
                                 "site-packages")

    if os.path.isdir(USER_SITE):
        addsitedir(USER_SITE, known_paths)
    return known_paths


Christian

From lists at cheimes.de  Sun May  4 18:24:38 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 04 May 2008 18:24:38 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <4222a8490805040717u5de66d0cw7eadf471f19fe7b8@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
	<4222a8490805040717u5de66d0cw7eadf471f19fe7b8@mail.gmail.com>
Message-ID: <481DE346.2040608@cheimes.de>

Jesse Noller schrieb:
 > One thing that could be done is pick a default name for the parent,
> ala ~/Python - but let users override it with an environment variable
> if they so desire (PYTHON_USER_DIR?) so that those who want it hidden
> can have it hidden, and those of us who don't, don't.

Has anybody read my PEP or do I need a Christian's English to real
English converter? *g*

>From my PEP 370:

---
The path to the user base directory can be overwritten with the
environment variable PYTHONUSERBASE. The default location is used when
PYTHONUSERBASE is not set or empty.
---

PYTHONUSERBASE defaults to ~/.local/ on Unix. In order to install
packages in ~/lib, ~/bin etc directly you can do

export PYTHONUSERBASE=$HOME

in your .bashrc or .profile.

Christian

From jimjjewett at gmail.com  Sun May  4 20:01:08 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sun, 4 May 2008 14:01:08 -0400
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <4819F9E7.9040706@v.loewis.de>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru>
	<4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru>
	<480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<4807C3C1.6010602@v.loewis.de>
	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
	<4819F9E7.9040706@v.loewis.de>
Message-ID: <fb6fbf560805041101x57380dccx8fc8360c61794631@mail.gmail.com>

On 5/1/08, "Martin v. L?wis" <martin at v.loewis.de> wrote:

>  - escaping looks like this:
>   * \r, \n, \t, \\
>   * \xXX for characters from Latin-1
>   * \uXXXX for characters from the BMP
>   * \U00XXXXXX for anything else

>  What I didn't have in my original proposal was escaping of Zs
>  except for space, which then would also escape NBSP, EN QUAD,
>  EM QUAD, THIN SPACE, HAIR SPACE, OGHAM SPACE MARK, etc. Escaping
>  them is fine also. Also, I didn't consider surrogate pairs in
>  UCS-2 builds originally; they should (of course) get represented
>  as-is.

I realize that this is the traditional escape form, but I wonder if it
might be better to just use the character names instead of the hex
character codes.  The names can be written in ASCII, they are
unambiguous, and they are easier to understand than a random hex
value.

-jJ

From lists at cheimes.de  Sun May  4 21:57:26 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 04 May 2008 21:57:26 +0200
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <481DE4CD.7070401@egenix.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>	<481DE0CE.8010306@cheimes.de>
	<481DE4CD.7070401@egenix.com>
Message-ID: <481E1526.6000903@cheimes.de>

M.-A. Lemburg schrieb:
>> PYTHONPATH is lacking one feature which is important for lots of
>> packages and setuptools. The directories in PYTHONPATH are just added to
>> sys.path. But setuptools require a site package directory. Maybe a new
>> env var PYTHONSITEPATH could solve the problem.
> 
> We don't need another setup variable for this. Just place a
> well-known module into the site-packages/ directory and then
> query it's __file__ attribute, e.g.
> 
> site-packages/site_packages.py
> 
> The module could even include a few helpers to query various
> settings which apply to the site packages directory, e.g.
> 
> site_packages.get_dir()
> site_packages.list_packages()
> site_packages.list_modules()
> etc.

I don't see how it is going to solve the use case "Add another site
package directory when I don't have write access to the global site
package directory and I don't want to modify my apps."

> Just in case you don't know...
> 
> python setup.py install --home=~
> 
> will install to ~/lib/python
> 
> The problem is not getting the packages installed in a non-admin
> location. It's about Python looking in a non-admin location per
> default (as well as in the site-packages location).

I know the --home option. For one the --home option is Unix only and not
supported on Windows Also the --user option takes all options of my PEP
370 user site directory into account, includinge the PYTHONUSERBASE env var.

Christian

From lists at cheimes.de  Sun May  4 21:59:51 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 04 May 2008 21:59:51 +0200
Subject: [Python-3000] PEP 370 (was Re: [Python-Dev] Reminder: last
 alphas next Wednesday 07-May-2008)
In-Reply-To: <481DC6A3.70104@gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
	<481DC6A3.70104@gmail.com>
Message-ID: <481E15B7.9060003@cheimes.de>

Nick Coghlan schrieb:
> This is what I see as the goal of PEP 370 as well. Perhaps the PEP could
> be more explicit in spelling that out?
> 
> """The primary goal of this PEP is to provide a standard mechanism
> allowing Python users to install distutils packages for their own use
> without affecting other users of the same machine, and without requiring
> any change to the packages themselves."""
> 
> I think the current Rationale section kind of assumes that the reader
> already recognises the above paragraph as the reason for the PEP.

Good point ;)
The author of the PEP was kinda sure all readers would recognize the
ratio. Again explicit is better than implicit. I'll update the PEP later.

> In the UNIX Notes section, the PEP should probably also state that the
> reason for choosing a hidden dot-file directory is that users generally
> aren't going to have any interest in the source files for the Python
> packages that they install, and that users that would prefer for the
> files to be visible can easily make a symbolic link to the directory.

Good point, too.

Thanks Nick!

Christian

From stephen at xemacs.org  Sun May  4 23:02:56 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Mon, 05 May 2008 06:02:56 +0900
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <fb6fbf560805041101x57380dccx8fc8360c61794631@mail.gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4805ECE1.6040501@gmail.com> <20080416124529.GC8598@phd.pp.ru>
	<4805FD56.6070902@gmail.com> <20080416133046.GB16087@phd.pp.ru>
	<480612CE.1010300@gmail.com>
	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>
	<4807C3C1.6010602@v.loewis.de>
	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>
	<4819F9E7.9040706@v.loewis.de>
	<fb6fbf560805041101x57380dccx8fc8360c61794631@mail.gmail.com>
Message-ID: <874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp>

Jim Jewett writes:

 > I realize that this is the traditional escape form, but I wonder if it
 > might be better to just use the character names instead of the hex
 > character codes.

That would require changing the parser, no?  Of all types, string had
better roundtrip through repr()!


From ncoghlan at gmail.com  Mon May  5 04:22:28 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 05 May 2008 12:22:28 +1000
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<4805ECE1.6040501@gmail.com>
	<20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com>
	<20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	<4807C3C1.6010602@v.loewis.de>	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>	<4819F9E7.9040706@v.loewis.de>	<fb6fbf560805041101x57380dccx8fc8360c61794631@mail.gmail.com>
	<874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <481E6F64.80902@gmail.com>

Stephen J. Turnbull wrote:
> Jim Jewett writes:
> 
>  > I realize that this is the traditional escape form, but I wonder if it
>  > might be better to just use the character names instead of the hex
>  > character codes.
> 
> That would require changing the parser, no?  Of all types, string had
> better roundtrip through repr()!

The string parser has understood Unicode names for quite some time 
(examples use 2.5.1):

 >>> print u"\N{GREEK SMALL LETTER ALPHA}"
?
 >>> print u"\N{GREEK CAPITAL LETTER ALPHA}"
?
 >>> print u"\N{GREEK CAPITAL LETTER ALPHA WITH TONOS}"
?

Using the names gets fairly verbose compared to the hex escapes though:

 >>> u"\N{GREEK SMALL LETTER ALPHA}"
u'\u03b1'
 >>> u"\N{GREEK CAPITAL LETTER ALPHA}"
u'\u0391'
 >>> u"\N{GREEK CAPITAL LETTER ALPHA WITH TONOS}"
u'\u0386'

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Mon May  5 05:38:04 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 4 May 2008 20:38:04 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
Message-ID: <ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>

This code is now open source! Browse it here:

  http://code.google.com/p/rietveld/source/browse

--Guido

On Thu, May 1, 2008 at 9:41 AM, Guido van Rossum <guido at python.org> wrote:
> Some of you may have seen a video recorded in November 2006 where I
>  showed off Mondrian, a code review tool that I was developing for
>  Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped
>  that I could release Mondrian as open source, but it was not to be:
>  due to its popularity inside Google, it became more and more tied to
>  proprietary Google infrastructure like Bigtable, and it remained
>  limited to Perforce, the commercial revision control system most used
>  at Google.
>
>  What I'm announcing now is the next best thing: an code review tool
>  for use with Subversion, inspired by Mondrian and (soon to be)
>  released as open source. Some of the code is even directly derived
>  from Mondrian. Most of the code is new though, written using Django
>  and running on Google App Engine.
>
>  I'm inviting the Python developer community to try out the tool on the
>  web for code reviews. I've added a few code reviews already, but I'm
>  hoping that more developers will upload at least one patch for review
>  and invite a reviewer to try it out.
>
>  To try it out, go here:
>
>     http://codereview.appspot.com
>
>  Please use the Help link in the top right to read more on how to use
>  the app. Please sign in using your Google Account (either a Gmail
>  address or a non-Gmail address registered with Google) to interact
>  more with the app (you need to be signed in to create new issues and
>  to add comments to existing issues).
>
>  Don't hesitate to drop me a note with feedback -- note though that
>  there are a few known issues listed at the end of the Help page. The
>  Help page is really a wiki, so feel free to improve it!
>
>  --
>  --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon May  5 05:42:15 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 4 May 2008 20:42:15 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>
Message-ID: <ca471dc20805042042k3c3bc2e3wac7165b4ecbeb8c1@mail.gmail.com>

I forgot -- you need to link or copy the 'django' directory from
Django 0.97.pre into the app directory. Otherwise you'll be using the
Django 0.96.1 that's included with the AppEngine runtime, and the code
is not compatible with that version.

On Sun, May 4, 2008 at 8:38 PM, Guido van Rossum <guido at python.org> wrote:
> This code is now open source! Browse it here:
>
>   http://code.google.com/p/rietveld/source/browse
>
>  --Guido
>
>
>
>  On Thu, May 1, 2008 at 9:41 AM, Guido van Rossum <guido at python.org> wrote:
>  > Some of you may have seen a video recorded in November 2006 where I
>  >  showed off Mondrian, a code review tool that I was developing for
>  >  Google (http://www.youtube.com/watch?v=sMql3Di4Kgc). I've always hoped
>  >  that I could release Mondrian as open source, but it was not to be:
>  >  due to its popularity inside Google, it became more and more tied to
>  >  proprietary Google infrastructure like Bigtable, and it remained
>  >  limited to Perforce, the commercial revision control system most used
>  >  at Google.
>  >
>  >  What I'm announcing now is the next best thing: an code review tool
>  >  for use with Subversion, inspired by Mondrian and (soon to be)
>  >  released as open source. Some of the code is even directly derived
>  >  from Mondrian. Most of the code is new though, written using Django
>  >  and running on Google App Engine.
>  >
>  >  I'm inviting the Python developer community to try out the tool on the
>  >  web for code reviews. I've added a few code reviews already, but I'm
>  >  hoping that more developers will upload at least one patch for review
>  >  and invite a reviewer to try it out.
>  >
>  >  To try it out, go here:
>  >
>  >     http://codereview.appspot.com
>  >
>  >  Please use the Help link in the top right to read more on how to use
>  >  the app. Please sign in using your Google Account (either a Gmail
>  >  address or a non-Gmail address registered with Google) to interact
>  >  more with the app (you need to be signed in to create new issues and
>  >  to add comments to existing issues).
>  >
>  >  Don't hesitate to drop me a note with feedback -- note though that
>  >  there are a few known issues listed at the end of the Help page. The
>  >  Help page is really a wiki, so feel free to improve it!
>  >
>  >  --
>  >  --Guido van Rossum (home page: http://www.python.org/~guido/)
>  >
>
>
>
>  --
>  --Guido van Rossum (home page: http://www.python.org/~guido/)
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From schmir at gmail.com  Mon May  5 13:28:06 2008
From: schmir at gmail.com (Ralf Schmitt)
Date: Mon, 5 May 2008 13:28:06 +0200
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
Message-ID: <932f8baf0805050428t2cd31b00x9c60d8ef43b5828c@mail.gmail.com>

On Thu, May 1, 2008 at 10:26 PM, Barry Warsaw <barry at python.org> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> This is a reminder that the LAST planned alpha releases of Python 2.6 and
> 3.0 are scheduled for next Wednesday, 07-May-2008.  Please be diligent over
> the next week so that none of your changes break Python.  The stable
> buildbots look moderately okay, let's see what we can do about getting them
> all green:
>
> http://www.python.org/dev/buildbot/stable/
>
> We have a few showstopper bugs, and I will be looking at these more
> carefully starting next week.
>
>
> http://bugs.python.org/issue?@columns=title,id,activity,versions,status&@sort=activity&@filter=priority,status&@pagesize=50&@startwith=0&priority=1&status=1&@dispname=Showstoppers
>

running the testsuite segfaults on my 64 bit debian testing in test_pyexpat.
This does not happen in a debug build:

test_pyclbr
test_pydoc
test_pyexpat

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2b573851a6e0 (LWP 19486)]
0x0000000000000000 in ?? ()
(gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x00002aaaaf694f0a in doContent (parser=0x1b4bab0, startTagLevel=0,
    enc=0x1b4dba0,
    s=0x1b4cb3c "</s>", ' ' <repeats 23 times>,
"frozenset([frozenset([2]),\n", ' ' <repeats 67 times>, "frozenset([0,\n", '
' <repeats 65 times>...,
    end=0x1b4cb40 ' ' <repeats 23 times>, "frozenset([frozenset([2]),\n", '
' <repeats 67 times>, "frozenset([0,\n", ' ' <repeats 69 times>...,
    nextPtr=0x1b4bae0, haveMore=1 '\001')
    at extensions/expat/lib/xmlparse.c:2540
#2  0x00002aaaaf6972ee in contentProcessor (parser=0x1b4bab0, start=0x0,
    end=0x1b4c470 "        q\001", endPtr=0x0)
    at extensions/expat/lib/xmlparse.c:2003
#3  0x00002aaaaf698662 in doProlog (parser=0x1b4bab0, enc=0x1b4dba0,
    s=0x1b4c738 "<s>", 'a' <repeats 197 times>...,
    end=0x1b4cb40 ' ' <repeats 23 times>, "frozenset([frozenset([2]),\n", '
' <repeats 67 times>, "frozenset([0,\n", ' ' <repeats 69 times>..., tok=29,
    next=0x1b4c738 "<s>", 'a' <repeats 197 times>..., nextPtr=0x1b4bae0,
    haveMore=1 '\001') at extensions/expat/lib/xmlparse.c:3803
#4  0x00002aaaaf69adc3 in prologInitProcessor (parser=0x1b4bab0,
    s=0x1b4c710 "<?xml version='1.0' encoding='iso8859'?><s>", 'a' <repeats
157 times>...,
    end=0x1b4cb40 ' ' <repeats 23 times>, "frozenset([frozenset([2]),\n", '
' <repeats 67 times>, "frozenset([0,\n", ' ' <repeats 69 times>...,
    nextPtr=0x1b4bae0) at extensions/expat/lib/xmlparse.c:3551
#5  0x00002aaaaf68cc61 in XML_ParseBuffer (parser=0x1d20670, len=28625724,
    isFinal=0) at extensions/expat/lib/xmlparse.c:1562
#6  0x00002aaaaf689467 in xmlparse_Parse (self=0x1d20670,
    args=<value optimized out>) at extensions/pyexpat.c:922
#7  0x0000000000419b9d in PyObject_Call (func=0x1a52b48, arg=0x2b7a710,
    kw=0x1b4c610) at Objects/abstract.c:2490
#8  0x00000000004902f8 in PyEval_EvalFrameEx (f=0x8cba40,
    throwflag=<value optimized out>) at Python/ceval.c:3944
#9  0x0000000000494824 in PyEval_EvalCodeEx (co=0x2b5738764558,
    globals=<value optimized out>, locals=<value optimized out>,
    args=0x23aa130, argcount=4, kws=0x23aa150, kwcount=0, defs=0x0,
    defcount=0, closure=0x0) at Python/ceval.c:2908
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080505/bf87b1bd/attachment.htm>

From martin at v.loewis.de  Mon May  5 18:30:30 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Mon, 05 May 2008 18:30:30 +0200
Subject: [Python-3000] Displaying strings containing unicode escapes
In-Reply-To: <481E6F64.80902@gmail.com>
References: <87ve2jdo17.fsf@uwakimon.sk.tsukuba.ac.jp>	<4805ECE1.6040501@gmail.com>	<20080416124529.GC8598@phd.pp.ru>	<4805FD56.6070902@gmail.com>	<20080416133046.GB16087@phd.pp.ru>	<480612CE.1010300@gmail.com>	<ca471dc20804160843s72976374n52cefae75a9af10e@mail.gmail.com>	<4807C3C1.6010602@v.loewis.de>	<ca471dc20804301036l4ecff247ne3a478968ac9b649@mail.gmail.com>	<4819F9E7.9040706@v.loewis.de>	<fb6fbf560805041101x57380dccx8fc8360c61794631@mail.gmail.com>	<874p9dwznj.fsf@uwakimon.sk.tsukuba.ac.jp>
	<481E6F64.80902@gmail.com>
Message-ID: <481F3626.5010207@v.loewis.de>

> Using the names gets fairly verbose compared to the hex escapes though:
> 
>>>> u"\N{GREEK SMALL LETTER ALPHA}"
> u'\u03b1'
>>>> u"\N{GREEK CAPITAL LETTER ALPHA}"
> u'\u0391'
>>>> u"\N{GREEK CAPITAL LETTER ALPHA WITH TONOS}"
> u'\u0386'

The extreme case (in Python 2.5) is

py> u"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH WITH HAMZA ABOVE WITH ALEF
MAKSURA ISOLATED FORM}"
u'\ufbf9'


Regards,
Martin

From martin at v.loewis.de  Mon May  5 18:46:32 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 05 May 2008 18:46:32 +0200
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>
Message-ID: <481F39E8.2010904@v.loewis.de>

> This code is now open source! Browse it here:
> 
>   http://code.google.com/p/rietveld/source/browse

Are you also going to call it Rietveld then? Sounds better
to me than "the open source code review tool".

Regards,
Martin

From guido at python.org  Mon May  5 19:24:56 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 5 May 2008 10:24:56 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <481F39E8.2010904@v.loewis.de>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>
	<481F39E8.2010904@v.loewis.de>
Message-ID: <ca471dc20805051024v4b88f28fvc0b713529dd08e68@mail.gmail.com>

On Mon, May 5, 2008 at 9:46 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > This code is now open source! Browse it here:
>  >
>  >   http://code.google.com/p/rietveld/source/browse
>
>  Are you also going to call it Rietveld then? Sounds better
>  to me than "the open source code review tool".

I've been reluctant to use the Rietveld name too much since Americans
can't spell it. :-) But the open source project *is* called Rietveld,
so I suppose I should start using that name...

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Mon May  5 19:32:43 2008
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 5 May 2008 12:32:43 -0500
Subject: [Python-3000] [Python-Dev] Invitation to try out open source
 code review tool
In-Reply-To: <ca471dc20805051024v4b88f28fvc0b713529dd08e68@mail.gmail.com>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>
	<481F39E8.2010904@v.loewis.de>
	<ca471dc20805051024v4b88f28fvc0b713529dd08e68@mail.gmail.com>
Message-ID: <18463.17595.324578.284849@montanaro-dyndns-org.local>


    Guido> I've been reluctant to use the Rietveld name too much since
    Guido> Americans can't spell it. :-) But the open source project *is*
    Guido> called Rietveld, so I suppose I should start using that name...

Which reminds me...  What's it mean?  All I saw was a Dutch city and
(maybe?) a Dutch architect by that name.

Skip

From guido at python.org  Mon May  5 19:33:57 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 5 May 2008 10:33:57 -0700
Subject: [Python-3000] [Python-Dev] Invitation to try out open source
	code review tool
In-Reply-To: <18463.17595.324578.284849@montanaro-dyndns-org.local>
References: <ca471dc20805010941k1f756fcax91e6f34fb72435fd@mail.gmail.com>
	<ca471dc20805042038x73ebba36v63ce336dc6e1234c@mail.gmail.com>
	<481F39E8.2010904@v.loewis.de>
	<ca471dc20805051024v4b88f28fvc0b713529dd08e68@mail.gmail.com>
	<18463.17595.324578.284849@montanaro-dyndns-org.local>
Message-ID: <ca471dc20805051033k648cdfc7k97f25b115fffbbda@mail.gmail.com>

On Mon, May 5, 2008 at 10:32 AM,  <skip at pobox.com> wrote:
>
>     Guido> I've been reluctant to use the Rietveld name too much since
>     Guido> Americans can't spell it. :-) But the open source project *is*
>     Guido> called Rietveld, so I suppose I should start using that name...
>
>  Which reminds me...  What's it mean?  All I saw was a Dutch city and
>  (maybe?) a Dutch architect by that name.
>
>  Skip
>

http://code.google.com/p/rietveld/wiki/CodeReviewBackground

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue May  6 00:33:40 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 5 May 2008 15:33:40 -0700
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
Message-ID: <ca471dc20805051533g2943483cyf45050d80f60508c@mail.gmail.com>

On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon <brett at python.org> wrote:
>  After two false starts over the YEARS of trying to cleanup and
>  reorganize the stdlib, creating a SIG to get this going, having Guido
>  give the PEP the once-over over the past several days, and creating
>  two new bugs reports (issues 2715 and 2716), PEP 3108 is finally ready
>  for public vetting!

I've accepted this PEP. Everyone, get to work on implementing this!
I'm sure some small nits will come up during the work that nobody
anticipated during the PEP discussion. In that case, let's be flexible
and work to update the PEP with the best possible solution.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Tue May  6 01:20:51 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 5 May 2008 16:20:51 -0700
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <ca471dc20805051533g2943483cyf45050d80f60508c@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<ca471dc20805051533g2943483cyf45050d80f60508c@mail.gmail.com>
Message-ID: <bbaeab100805051620w19b8cf65see16cb4ebd213bc1@mail.gmail.com>

On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum <guido at python.org> wrote:
> On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon <brett at python.org> wrote:
>  >  After two false starts over the YEARS of trying to cleanup and
>  >  reorganize the stdlib, creating a SIG to get this going, having Guido
>  >  give the PEP the once-over over the past several days, and creating
>  >  two new bugs reports (issues 2715 and 2716), PEP 3108 is finally ready
>  >  for public vetting!
>
>  I've accepted this PEP.

Woohoo!

> Everyone, get to work on implementing this!
>  I'm sure some small nits will come up during the work that nobody
>  anticipated during the PEP discussion. In that case, let's be flexible
>  and work to update the PEP with the best possible solution.

And use the PEP to keep track of what state everything is in!
Hopefully I will start work on this tonight or tomorrow.

-Brett

From musiccomposition at gmail.com  Tue May  6 02:03:35 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Mon, 5 May 2008 19:03:35 -0500
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <bbaeab100805051620w19b8cf65see16cb4ebd213bc1@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<ca471dc20805051533g2943483cyf45050d80f60508c@mail.gmail.com>
	<bbaeab100805051620w19b8cf65see16cb4ebd213bc1@mail.gmail.com>
Message-ID: <1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com>

On Mon, May 5, 2008 at 6:20 PM, Brett Cannon <brett at python.org> wrote:
> On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum <guido at python.org> wrote:
>  >
>  >  I've accepted this PEP.
>
>  Woohoo!

Congrats!

>
>  > Everyone, get to work on implementing this!
>  >  I'm sure some small nits will come up during the work that nobody
>  >  anticipated during the PEP discussion. In that case, let's be flexible
>  >  and work to update the PEP with the best possible solution.
>
>  And use the PEP to keep track of what state everything is in!
>  Hopefully I will start work on this tonight or tomorrow.

What can I/we do to help?


-- 
Cheers,
Benjamin Peterson

From ishimoto at gembook.org  Tue May  6 02:56:24 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Tue, 6 May 2008 09:56:24 +0900
Subject: [Python-3000] PEP 3108 - String representation in Python 3000
Message-ID: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>

I've written a PEP for new string representation in Python 3000.

Patch is updated at http://bugs.python.org/issue2630, and Guido
updated a patch to Rietveld:
http://codereview.appspot.com/767 .

I would appreciate your comments and help.

-----------------------------------------------
PEP: 3138
Title: String representation in Python 3000
Version: $Revision$
Last-Modified: $Date$
Author: Atsuo Ishimoto <ishimoto--at--gembook.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 05-May-2008
Post-History:

Abstract
========

This PEP proposes new string representation form for Python 3000. In
Python prior to Python 3000, the repr() built-in function converts
arbitrary objects to printable ASCII strings for debugging and logging.
For Python 3000, a wider range of characters, based on the Unicode
standard, should be considered 'printable'.


Motivation
==========

The current repr() converts 8-bit strings to ASCII using following
algorithm.

- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
  characters(>=0x80) to '\\xXX'.

- Backslash-escape quote characters(' or ") and add quote character at
  head and tail.

For Unicode strings, the following additional conversions are done.

- Convert leading surrogate pair characters without trailing character
  (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.

- Convert 16-bit characters(>=0x100) to '\\uXXXX'.

- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
  '\\U00xxxxxx'.

This algorithm converts any string to printable ASCII, and repr() is
used as handy and safe way to print strings for debugging or for
logging. Although all non-ASCII characters are escaped, this does not
matter when most of the string's characters are ASCII. But for other
languages, such as Japanese where most characters in a string are not
ASCII, this is very inconvenient. Python 3000 has a lot of nice features
for non-Latin users such as non-ASCII identifiers, so it would be
helpful if Python could also progress in a similar way for printable
output.

Some users might be concerned that such output will mess up their
console if they print binary data like images. But this is unlikely to
happen in practice because bytes and strings are different types in
Python 3000, so printing an image to the console won't mess it up.

This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected.


Specification
=============

- The algorithm to build repr() strings should be changed to:

  * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

  * Convert other non-printable ASCII characters(0x00-0x1f, 0x7f) to
    '\\xXX'.

  * Convert leading surrogate pair characters without trailing character
    (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.

  * Convert Unicode whitespace other than ASCII space('\\x20') and
    control characters (categories Z* and C* in the Unicode database)
    to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.

- Set the Unicode error-handler for sys.stdout and sys.stderr to
  'backslashreplace' by default.


Rationale
=========

The repr() in Python 3000 should be Unicode not ASCII based, just like
Python 3000 strings. Also, conversion should not be affected by the
locale setting, because the locale is not necessarily the same as the
output device's locale. For example, it is common for a daemon process
to be invoked in an ASCII setting, but writes UTF-8 to its log files.

Characters not supported by user's console are hex-escaped on printing,
by the Unicode encoders' error-handler. If the error-handler of the
output file is 'backslashreplace', such characters are hex-escaped
without raising UnicodeEncodeError. For example, if your default
encoding is ASCII, ``print('?')`` will prints '\\xa2'. If your encoding
is ISO-8859-1, '' will be printed.


Printable characters
--------------------

The Unicode standard doesn't define Non-printable characters, so we must
create our own definition. Here we propose to define Non-printable
characters as follows.

- Non-printable ASCII characters as Python 2.

- Broken surrogate pair characters.

- Characters defined in the Unicode character database as

  * Cc (Other, Control)
  * Cf (Other, Format)
  * Cs (Other, Surrogate)
  * Co (Other, Private Use)
  * Cn (Other, Not Assigned)
  * Zl Separator, Line ('\\u2028', LINE SEPARATOR)
  * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR)
  * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
    this category should be escaped to avoid ambiguity.


Alternate Solutions
-------------------

To help debugging in non-Latin languages without changing repr(), other
suggestion were made.

- Supply a tool to print lists or dicts.

 Strings to be printed for debugging are not only contained by lists or
 dicts, but also in many other types of object. File objects contain a
 file name in Unicode, exception objects contain a message in Unicode,
 etc. These strings should be printed in readable form when repr()ed.
 It is unlikely to be possible to implement a tool to print all
 possible object types.

- Use sys.displayhook and sys.excepthook.

 For interactive sessions, we can write hooks to restore hex escaped
 characters to the original characters. But these hooks are called only
 when the result of evaluating an expression entered in an interactive
 Python session, and doesn't work for the print() function or for
 non-interactive sessions.

- Subclass sys.stdout and sys.stderr.

 It is difficult to implement a subclass to restore hex-escaped
 characters since there isn't enough information left by the time it's
 a string to undo the escaping correctly in all cases. For example, ``
 print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
 there is no chance to tell file objects apart.

- Make the encoding used by unicode_repr() adjustable.

 There is no benefit preserving the current repr() behavior to make
 application/library authors aware of non-ASCII repr(). And selecting
 an encoding on printing is more flexible than having a global setting.


Open Issues
===========

- A lot of people use UTF-8 for their encoding, for example, en_US.utf8
  and de_DE.utf8. In such cases, the backslashescape trick doesn't work.


Backwards Compatibility
=======================

Changing repr() may break some existing codes, especially testing code.
Five of Python's regression test fail with this modification. If you
need repr() strings without non-ASCII character as Python 2, you can use
following function.

::

 def repr_ascii(obj):
     return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")


Reference Implementation
========================

http://bugs.python.org/issue2630


References
==========

.. [1] Multibyte string on string::string_print
       (http://bugs.python.org/issue479898)


Copyright
=========

This document has been placed in the public domain.

From ishimoto at gembook.org  Tue May  6 03:00:22 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Tue, 6 May 2008 10:00:22 +0900
Subject: [Python-3000] PEP 3138 - String representation in Python 3000
Message-ID: <797440730805051800w15a13a8eodc56e2cde72f9177@mail.gmail.com>

Oops, I missed PEP-number in the subject!  "PEP 3138 - String
representation in Python 3000" should be correct subject.

From phd at phd.pp.ru  Tue May  6 08:09:17 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 6 May 2008 10:09:17 +0400
Subject: [Python-3000] PEP 3138 - String representation in Python 3000
In-Reply-To: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
Message-ID: <20080506060917.GA29253@phd.pp.ru>

Hello! Well done! Thank you!

On Tue, May 06, 2008 at 09:56:24AM +0900, Atsuo Ishimoto wrote:
> I've written a PEP for new string representation in Python 3000.
> 
> Patch is updated at http://bugs.python.org/issue2630, and Guido
> updated a patch to Rietveld:
> http://codereview.appspot.com/767 .
> 
> I would appreciate your comments and help.
> 
> -----------------------------------------------
> PEP: 3138
> Title: String representation in Python 3000
> Version: $Revision$
> Last-Modified: $Date$
> Author: Atsuo Ishimoto <ishimoto--at--gembook.org>
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 05-May-2008
> Post-History:
> 
> Abstract
> ========
> 
> This PEP proposes new string representation form for Python 3000. In
> Python prior to Python 3000, the repr() built-in function converts
> arbitrary objects to printable ASCII strings for debugging and logging.
> For Python 3000, a wider range of characters, based on the Unicode
> standard, should be considered 'printable'.
> 
> 
> Motivation
> ==========
> 
> The current repr() converts 8-bit strings to ASCII using following
> algorithm.
> 
> - Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
> 
> - Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
>   characters(>=0x80) to '\\xXX'.
> 
> - Backslash-escape quote characters(' or ")

   Currently Python doesn't escape double-quote ("), it only escapes
apostrophe (').

> and add quote character at

   "and add the quote character (apostrophe, ') at"

>   head and tail.

   I think they are "the beginning" and "the end" of the string.

> For Unicode strings, the following additional conversions are done.
> 
> - Convert leading surrogate pair characters without trailing character
>   (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
> 
> - Convert 16-bit characters(>=0x100) to '\\uXXXX'.
> 
> - Convert 21-bit characters(>=0x10000) and surrogate pair characters to
>   '\\U00xxxxxx'.
> 
> This algorithm converts any string to printable ASCII, and repr() is
> used as handy and safe way to print strings for debugging or for
> logging. Although all non-ASCII characters are escaped, this does not
> matter when most of the string's characters are ASCII. But for other
> languages, such as Japanese where most characters in a string are not
> ASCII, this is very inconvenient. Python 3000 has a lot of nice features
> for non-Latin users such as non-ASCII identifiers, so it would be
> helpful if Python could also progress in a similar way for printable
> output.
> 
> Some users might be concerned that such output will mess up their
> console if they print binary data like images. But this is unlikely to
> happen in practice because bytes and strings are different types in
> Python 3000, so printing an image to the console won't mess it up.
> 
> This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected.
> 
> 
> Specification
> =============
> 
> - The algorithm to build repr() strings should be changed to:
> 
>   * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
> 
>   * Convert other non-printable ASCII characters(0x00-0x1f, 0x7f) to
>     '\\xXX'.
> 
>   * Convert leading surrogate pair characters without trailing character
>     (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
> 
>   * Convert Unicode whitespace other than ASCII space('\\x20') and
>     control characters (categories Z* and C* in the Unicode database)
>     to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.
> 
> - Set the Unicode error-handler for sys.stdout and sys.stderr to
>   'backslashreplace' by default.
> 
> 
> Rationale
> =========
> 
> The repr() in Python 3000 should be Unicode not ASCII based, just like
> Python 3000 strings. Also, conversion should not be affected by the
> locale setting, because the locale is not necessarily the same as the
> output device's locale. For example, it is common for a daemon process
> to be invoked in an ASCII setting, but writes UTF-8 to its log files.

   Not only to log files. HTTP daemons, e.g., run with one locale but
answer to all kinds of clients.

> Characters not supported by user's console are hex-escaped on printing,
> by the Unicode encoders' error-handler. If the error-handler of the
> output file is 'backslashreplace', such characters are hex-escaped
> without raising UnicodeEncodeError. For example, if your default
> encoding is ASCII, ``print('?')`` will prints '\\xa2'. If your encoding
> is ISO-8859-1, '' will be printed.
> 
> 
> Printable characters
> --------------------
> 
> The Unicode standard doesn't define Non-printable characters, so we must
> create our own definition. Here we propose to define Non-printable
> characters as follows.
> 
> - Non-printable ASCII characters as Python 2.
> 
> - Broken surrogate pair characters.
> 
> - Characters defined in the Unicode character database as
> 
>   * Cc (Other, Control)
>   * Cf (Other, Format)
>   * Cs (Other, Surrogate)
>   * Co (Other, Private Use)
>   * Cn (Other, Not Assigned)
>   * Zl Separator, Line ('\\u2028', LINE SEPARATOR)
>   * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR)
>   * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
>     this category should be escaped to avoid ambiguity.
> 
> 
> Alternate Solutions
> -------------------
> 
> To help debugging in non-Latin languages without changing repr(), other
> suggestion were made.
> 
> - Supply a tool to print lists or dicts.
> 
>  Strings to be printed for debugging are not only contained by lists or
>  dicts, but also in many other types of object. File objects contain a
>  file name in Unicode, exception objects contain a message in Unicode,
>  etc. These strings should be printed in readable form when repr()ed.
>  It is unlikely to be possible to implement a tool to print all
>  possible object types.
> 
> - Use sys.displayhook and sys.excepthook.
> 
>  For interactive sessions, we can write hooks to restore hex escaped
>  characters to the original characters. But these hooks are called only
>  when the result of evaluating an expression entered in an interactive
>  Python session, and doesn't work for the print() function or for
>  non-interactive sessions.

   Or for logging.debug("%r", ...)

> - Subclass sys.stdout and sys.stderr.
> 
>  It is difficult to implement a subclass to restore hex-escaped
>  characters since there isn't enough information left by the time it's
>  a string to undo the escaping correctly in all cases. For example, ``
>  print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
>  there is no chance to tell file objects apart.
> 
> - Make the encoding used by unicode_repr() adjustable.
> 
>  There is no benefit preserving the current repr() behavior to make
>  application/library authors aware of non-ASCII repr(). And selecting
>  an encoding on printing is more flexible than having a global setting.
> 
> 
> Open Issues
> ===========
> 
> - A lot of people use UTF-8 for their encoding, for example, en_US.utf8
>   and de_DE.utf8. In such cases, the backslashescape trick doesn't work.

   Also there is a problem of similarly drawing characters in Western,
Greek and Cyrillic languages. These languages use similar (but different)
alphabets (descended from the common ancestor) and contain letters that
look similar but has different character codes. For example, it is hard to
distinguish Latin 'a', 'e' and 'o' from Cyrillic '?', '?' and '?'. (The
visual representation, of course, very much depends on the fonts used but
usually these letters are almost indistinguishable.)

> Backwards Compatibility
> =======================
> 
> Changing repr() may break some existing codes, especially testing code.
> Five of Python's regression test fail with this modification. If you
> need repr() strings without non-ASCII character as Python 2, you can use
> following function.
> 
> ::
> 
>  def repr_ascii(obj):
>      return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")
> 
> 
> Reference Implementation
> ========================
> 
> http://bugs.python.org/issue2630
> 
> 
> References
> ==========
> 
> .. [1] Multibyte string on string::string_print
>        (http://bugs.python.org/issue479898)
> 
> 
> Copyright
> =========
> 
> This document has been placed in the public domain.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/phd%40phd.pp.ru

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From brett at python.org  Tue May  6 08:22:52 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 5 May 2008 23:22:52 -0700
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<ca471dc20805051533g2943483cyf45050d80f60508c@mail.gmail.com>
	<bbaeab100805051620w19b8cf65see16cb4ebd213bc1@mail.gmail.com>
	<1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com>
Message-ID: <bbaeab100805052322l6a4aad62wb383e3c610ad3a44@mail.gmail.com>

On Mon, May 5, 2008 at 5:03 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Mon, May 5, 2008 at 6:20 PM, Brett Cannon <brett at python.org> wrote:
>  > On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum <guido at python.org> wrote:
>  >  >
>
> >  >  I've accepted this PEP.
>  >
>  >  Woohoo!
>
>  Congrats!
>
>
>  >
>  >  > Everyone, get to work on implementing this!
>  >  >  I'm sure some small nits will come up during the work that nobody
>  >  >  anticipated during the PEP discussion. In that case, let's be flexible
>  >  >  and work to update the PEP with the best possible solution.
>  >
>  >  And use the PEP to keep track of what state everything is in!
>  >  Hopefully I will start work on this tonight or tomorrow.
>
>  What can I/we do to help?

Once I have worked out exactly needs to be done for each possible
thing (deletion, rename), then going through the motions in terms of
just doing the right thing for 2.6/3.0. I have an idea on how I want
to test the deletion warnings. Once I have that in place then it
should be a matter of adding the tests to test_py3kwarn, the warning
in the module, and the proper note in the docs.

-Brett

From glyph at divmod.com  Fri May  2 02:03:24 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 02 May 2008 00:03:24 -0000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
Message-ID: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>

On 11:45 pm, guido at python.org wrote:
>I like this, except one issue: I really don't like the .local
>directory. I don't see any compelling reason why this needs to be
>~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide
>it from view, especially since the user is expected to manage this
>explicitly.

I've previously given a spirited defense of ~/.local on this list ( 
http://mail.python.org/pipermail/python-dev/2008-January/076173.html ) 
among other places.

Briefly, "lib" is not the only directory participating in this 
convention; you've also got the full complement of other stuff that 
might go into an installation like /usr/local.  So, while "lib" might 
annoy me a little, "bin etc games include lib lib32 man sbin share src" 
is going to get ugly pretty fast, especially if this is what comes up in 
Finder or Nautilus or Explorer every time I open a window.  If it's 
going to be a visible directory on the grounds that this is a Python- 
specific thing that is explicitly *not* participating in a convention 
with other software, then please call it "~/Python" or something.

Am I the only guy who finds software that insists on visible, fixed 
files in my home directory rude?  vmware, for example, wants a 
"~/vmware" directory, but pretty much every other application I use is 
nice enough to use dotfiles (even cedega, with a roughly-comparable-to- 
lib "applications I've installed for you" folder).

Put another way - it's trivial to make ~/.local/lib show up by 
symlinking ~/lib, but you can't make ~/lib disappear, and lots of 
software ends up looking at ~.

From asmodai at in-nomine.org  Fri May  2 07:07:20 2008
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Fri, 2 May 2008 07:07:20 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
Message-ID: <20080502050720.GO78165@nexus.in-nomine.org>

-On [20080501 22:27], Barry Warsaw (barry at python.org) wrote:
>Time is running short to get any new features into Python 2.6 and  
>3.0.

Is there a reliable way to identify 32-bits and 64-bits Windows from within
Python? I have not found any yet, but it might be a mere oversight on my
behalf.

The reason I ask is that both return win32, which is most likely a reference
to the API, even when having installed the 64 bits Python version. This, of
course, by using win32 causes some issues with, for example, setuptools
since it generate an egg with a win32 identifier. Now if you have Python C
extension code it will be 64-bit compiled, thus not working on 32-bits
Windows.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
All are lunatics, but he who can analyze his delusions is called a
philosopher.

From glyph at divmod.com  Fri May  2 05:25:49 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 02 May 2008 03:25:49 -0000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
Message-ID: <20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>

On 01:55 am, guido at python.org wrote:
>On Thu, May 1, 2008 at 5:03 PM,  <glyph at divmod.com> wrote:

Hi everybody.  I apologize for writing yet another lengthy screed about 
a simple directory naming issue.  I feel strongly about it but I 
encourate anyone who doesn't to simply skip it.

First, some background: my strong feelings here are actually based on an 
experience I had a long time ago when helping someone with some C++ 
programming homework.  They were baffled because when I helped them the 
programs compiled, but then as soon as they tried it on their own it 
didn't.  The issue was that I had replicated my own autotools-friendly 
directory structure for them (at the time, "~/bin", "~/include", 
"~/lib", "~/etc", and so on managed with GNU stow) onto their machine 
and edited their shell setup to include them appropriately.  But, as 
soon as I was finished, they "cleaned up" the "mess" I had left behind, 
and thereby removed all of their build dependencies.  This was on a 
shared university build server, before the days of linux as a friendly, 
graphical operating system which encouraged you to look even more 
frequently at your home directory, so if anything I suspect the 
likelihood that this is a problem would be worse now.  Since cleaning up 
my own home directory, of course, I find that I appreciate the lack of 
visual noise in Nautilus et. al. as well.

Also, while I obviously think all tools should work this way, I think 
that Python in particular will attract an audience who is learning to 
program but not necessarily savvy with arcane nuances of filesystem 
layout, and it would be best if those details were abstracted.

My concern here is for the naive python developer reading installation 
instructions off of a wiki and trying to get started with Twisted 
development.  Seeing a directory created in your home directory (or, as 
the case may be, 3 directories, "bin", "lib", and "include") is a bit of 
a surprise.  They don't actually care where the files in their installed 
library are, as long as they're "installed", and they can import them. 
However, they may care that clicking on the little house icon now shows 
not "Pictures", "Movies", etc, but "lib" (what's a 'lib'?) "bin" (what's 
a bin?  is that like a box where I throw my stuff?) "share" (I put my 
stuff in "share", but it's not shared.  Wait, I'm supposed to put it in 
"Public"?).
>>  Briefly, "lib" is not the only directory participating in this 
>>convention;
>>you've also got the full complement of other stuff that might go into 
>>an
>>installation like /usr/local.  So, while "lib" might annoy me a 
>>little, "bin
>>etc games include lib lib32 man sbin share src" is going to get ugly 
>>pretty
>>fast, especially if this is what comes up in Finder or Nautilus or 
>>Explorer
>>every time I open a window.
>
>Unless I misread the PEP, there's only going to be a lib subdirectory.
>Python packages don't put stuff in other places AFAIK.

Python packages, at the very least, frequently put stuff in "bin" (or 
"scripts", I think, on Windows).  Not all Python packages are pure- 
Python packages either; setup.py boasts --install-platlib, --install- 
headers, --install-data, and --exec-prefix options, which suggests an 
"include", "bin", and "share" directory, at least.  I'm sure if I had 
more time to grovel around I'd find one that installed manpages. 
Twisted has some, but apparently setup.py doesn't do anything with them, 
we leave that to the OS packages...

Of course, very little of this is handled by the PEP.  But even the 
usage of the name "lib" implies that the PEP is taking some care to be 
compatible with an idiom that goes beyond Python itself here, or at 
least beyond simple Python packages.

Even assuming that no Python library ever wanted to install any of these 
things, there are many Python libraries which are simply wrappers around 
lower-level libraries, and if I want to perform a per-user install of 
one of those, I am going to ./configure --prefix=~/something (and by 
"something", I mean ".local" ;)) and it would be nice to have Python 
living in the same space.  For that matter it'd be nice to get autotools 
and Ruby and PHP and Perl and Emacs (ad nauseum) all looking at ~/.local 
as a mirror of /usr, so that I didn't have to write a bunch of shell 
bootstrap glue to get everything to behave consistently, or learn the 
new, special names for bits of configuration under "~" that are 
different from the ones under /usr/local or /etc.

I replicate a consistent Python development environment with a ton of 
bizarre dependencies across something like 15 different OS installations 
(not to mention a bevy of virtual machines I keep around just for fun), 
so I think about these issues a lot.  Most of these machines are macs 
and linux boxes, but I do my best on Windows too.  FWIW I don't have any 
idea what the right thing to do is on Windows; ".local" doesn't 
particularly make sense, but neither does "lib" in that context. 
There's no reasonable guess as to where to put scripts, or dependent 
shared libraries... but then, per-user installation is less of an issue 
on Windows.
>On the Mac, the default Finder window is not your home directory but
>your Desktop, which is a subdirectory thereof with a markedly public
>name. In fact, OS X has a whole bunch of reserved names in your home
>directory, and none of them start with a dot. The rule seems to be
>that if it contains stuff that the user cares about, it doesn't start
>with a dot.

Hmm.  On my Mac laptop, the default Finder window is definitely my home 
directory; this may be an artifact of many OS upgrades or some tweak 
that I performed a long time ago and forgot about, though.  Apologies if 
that is not the average user experience.

For what it's worth, Ubuntu also has some directories that it creates: 
Desktop, Pictures, Documents, Examples, Templates, Videos.  These are 
empty, and I typically delete the ones I don't use.
>>If it's going to be a visible directory on the
>>grounds that this is a Python- specific thing that is explicitly *not*
>>participating in a convention with other software, then please call it
>>"~/Python" or something.
>
>Much better than ~/.local/ IMO.

It depends how this is being perceived.  If this is Python mirroring the 
/usr/local layout convention for users, as the name "lib" implies, then 
this is worse.  However, if Python is just trying to select a location 
for its own library bookkeeping and not allow the installation of 
platform libraries or scripts using this mechanism... well, ~/.python.d 
would still be my preference ;-) but I could at least understand 
"Python" as mirroring the Mac, GNOME and KDE convention for a few very 
special directories.
>>  Am I the only guy who finds software that insists on visible, fixed 
>>files
>>in my home directory rude?  vmware, for example, wants a "~/vmware"
>>directory, but pretty much every other application I use is nice 
>>enough to
>>use dotfiles (even cedega, with a roughly-comparable-to- lib 
>>"applications
>>I've installed for you" folder).
>
>The distinction to my mind is that most dot files (with the exception
>of a few like .profile or .bashrc) are not managed by most users --
>the apps that manage them provide an APIs for manipulating their
>contents.  (Sort of like thw Windows registry.)  Non-dot files are for
>stuff that the user needs to be aware of.

My experience of modern Linux suggests that the usage you're describing 
is gradually being phased out - applications that want to manage some 
non-user-visible storage in something like the registry increasingly use 
gconf (or a database, in server-land).  Granted, gconf itself is stored 
in dotfiles, but it's just a few.

In my home directory I have, in version control, variously written by 
hand or databases maintained from externally downloaded stuff:

    ~/.asoundrc
    ~/.emacs
    ~/.vimrc
    ~/.vim
    ~/.Xresources
    ~/.fonts
    ~/.gnomerc
    ~/.inputrc
    ~/.bashrc
    ~/.bash_profile
    ~/.profile
    ~/.screenrc
    ~/.Xresources
    ~/.ssh/config
    ~/.ssh/authorized_keys
    ~/.ssh/known_hosts

I know about these dot files and I care about them and I maintain them, 
but they're there for the benefit of particular pieces of software, not 
me.  There are a lot of other dotfiles there, but I don't think that 
this set is "a few";   I am quite happy that I don't have to see every 
one of them every time I am looking at my home directory in a "save as" 
dialog.
>I'm not sure where Python packages fall, but ISTM that this is
>something a user must explicitly choose as the target of an installer.
>The user is also likely to have to dig through there to remove stuff,
>as Python package management doesn't have a way to remove packages.

I hope that users never have to explicitly choose this as the target of 
the installer; I was under the impression that the point of adding this 
feature was to allow the default behavior of distutils to work simply 
and automatically on UNIX-y platforms rather than puking about 
permissions, or requiring arcana like  "sudo" access or editing your 
shell's startup.  I am quietly agitating elsewhere to get ~/.local/bin 
added to $PATH by default, by the way ;-).  (~/.local/lib on 
$LD_LIBRARY_PATH is a hard sell, but that too...)

Once you have to know about it and explicitly choose it it's not much 
more work to set all the appropriate shell environment variables 
yourself.  And, for that matter, *I* already have, so I suppose 
regardless of the outcome of this discussion I'll still have a ~/.local 
:-).
>>  Put another way - it's trivial to make ~/.local/lib show up by 
>>symlinking
>>~/lib,
>
>That's not the same thing at all.

I'm not sure what you're saying it's not the same as.  All I'm saying is 
that if advanced users want to show it, they'll symlink it; if naive 
users want to hide it, they'll delete it and break python, possibly 
without knowing why ;).
>>but you can't make ~/lib disappear, and lots of software ends up
>>looking at ~.
>
>But what software cares about another file there? My home directory is
>mostly a switching point where I have quick access to everything I
>access regularly.

Nothing's going to break, if that's what you mean.  No software 
processes the list of ~ and does anything with it; but lots of stuff 
shows me that list.  In GNOME, on Ubuntu, when a "choose file" dialog 
comes up, 80% of the time it comes up by default in my home directory. 
When I open a terminal it opens in my home directory.  The default 
location for Emacs is my home directory.  I can quickly measure my 
cognitive load by looking at the contents of that directory.  Since my 
shell starts there, autocomplete starts there, and so common-letter real 
estate is scarce.  I have a directory called "Projects" that I currently 
autocomplete with 'p<tab>' and a directory called 'Linux' that I 
autocomplete with 'l<tab>'; either public-name proposal will have me 
typing an additional letter on these every day ;-).

In other words, I care about another file there.  I use my home 
directory as a sort of to-do list; it's mostly empty unless I have a lot 
going on, in which case it fills up with various objects I'm working on, 
and then I empty it out again.  There are a few exceptions to this rule; 
on every platform there are a few things the OS puts there, but they are 
generally things like "Pictures", "Desktop", and "Music"... where I put 
pictures, downloaded files, and music. The Mac's "Library" directory has 
never bothered me, since it's OS-provided and basically an alternate 
location for dotfiles.  ("Application Data" and friends are another 
story.)

In a way, I agree with you.  "everything I access regularly" is a good 
description of my home directory.  Except, this "lib" directory is not 
something I want to access regularly; very occasionally, maybe once 
every few weeks, I want to chuck some dependency in there and then 
forget about it for a year.

From glyph at divmod.com  Fri May  2 07:48:17 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 02 May 2008 05:48:17 -0000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
Message-ID: <20080502054817.25821.967243921.divmod.xquotient.7459@joule.divmod.com>


On 03:49 am, guido at python.org wrote:
>I stand corrected on a few points. You've convinced me that ~/lib/ is
>wrong. But I still don't like ~/.local/; not in the last place because
>it's not any more local than any other dot files or directories. The
>"symmetry" with /usr/local/ is pretty weak, and certainly won't help
>beginning users.

Why do you say the symmetry is weak?  The name might not be that 
evocative, but the main thrust of what I'm saying is that "~/.<x>" 
should be an autotools-style directory layout.  The symmetry I suggest 
is in exactly that sense; that's what /usr/local is.  I don't actually 
care what "<x>" is, except I (and many others) already use "local" for 
that value, and the more software that honors it, the better.  GNU stow 
(arguably the king of per-user installation management) suggests ~/local 
as an autotools --prefix target; the free desktop project implicitly 
suggests ~/.local (by suggesting ~/.local/share is a place to put the 
same files that would normally be searched for in /usr/share and 
/usr/local/share).  So the word "local" is just floating around in this 
meme space; I don't like the word that much, but I don't see that 
there's a different one which more clearly evokes the concept either.  I 
originally used "~/UNIX" and then ~/.unix, but switched to .local when I 
noticed other folks doing it.  One I've actually seen mentioned a few 
times is "~/.nix-config", which I certainly don't think is any better.

It would help beginning users if ~/.local/bin and ~/.local/lib were 
honored by the system.  I, and other adherents of this idea that it 
would be nice if users could install source without admin privs, have 
been suggesting that to distro guys when I (we) can, and I figure in a 
few years, somebody might bite.  If that happens, it will start being 
*easier* to build stuff from source into a separated location than to 
need root, stomp on the system, and inevitably break some stuff. 
Agitating for ~/Python/Platform/Libraries on $LD_LIBRARY_PATH (or 
equivalent) is a lot harder to do with a straight face.

This is the reason I'm bothering to spill so many pixels on this topic; 
I think it would be great if Python were the first real adopter of this 
convention, and once *one* project has really gone full bore, each 
subsequent one is progressively easier to convince.  However, if you've 
made up your mind on ~/Python, I think I've more than made my case at 
this point, so I'll stop cluttering up the lists :).

(By the way, for what it's worth: I _hate_ the 
bin/lib/etc/man/src/include naming convention mess, but it's a mess 
which is programmatically honored in like a hundred billion lines of 
code.  This is why I want it supported, but hidden ;).)
>As a compromise, I'm okay with ~/Python/. I would like to be able to
>say that the user explicitly has to set an environment variable in
>order to benefit from this feature, just like with $PYTHONPATH and
>$PYTHONSTARTUP. But that might defeat the point of making this easy to
>use for noobs.

Is there another point?  It seems to me that this change is entirely 
about shared conventions and "works by default" behavior.  If you are 
going to set an environment variable, set PYTHONPATH; it's already much 
more flexible.

~/Python opens up some new problems though, although perhaps they are 
trivially resolved: how should this interoperate with distutils?  'Just 
make "python setup.py --user" do what "python setup.py --prefix 
~/.local" would do' is pretty straightforward, but "~/Python" would need 
a new convention.  Should "~/Python" have a "~/Python/Scripts" directory 
that one could add to $PATH?  A "~/Python/Platform" directory, for 
includes, libraries, other random junk like manpages or HTML docs? 
~/Python/2.6/lib, or ~/Python2.6/lib?

To be fair, a separate, and purpose-designed Python directory layout 
might also make certain things neater.  For example one could support 
parallel installation with Python2.6 (or Python/2.6) by giving each a 
'lib' and 'bin' directory, and always having the scripts in the 2.6/bin 
dir invoke the 2.6 interpreter, rather than having separated space for 
libraries but having to mangle the names of scripts ("twistd8.0-py2.6"). 
I'd still prefer compatibility-by-convention with other tools, 
languages, etc, though.  In the long term, if everyone followed suit on 
~/.local, that would be great.  But I don't want a ~/Python, ~/Java, 
~/Ruby, ~/PHP, ~/Perl, ~/OCaml and ~/Erlang and a $PATH as long as my 
arm just so I can run a few applications without system-installing them.
>On OS X I think we should put this somewhere under ~/Library/. Just
>put it in a different place than where the Python framework puts its
>stuff.

Isn't the whole point that it should be the same place?  Under current 
Python releases, OS X already has this functionality via 
~/Library/Python/2.5/site-packages.

Also, I'd strongly suggest supporting both ~/Library (although the 
existing location seems fine to me) *and* whatever the default is on 
other platforms; there are already enough points of pain where OS X 
behaves "kind of like a UNIX, but not really", and every project needs 
to add these little workarounds and caveats in the documentation.  Is 
there a benefit to be derived from making this situation worse by 
introducing another such subtlety?

From steve at holdenweb.com  Fri May  2 10:49:17 2008
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 02 May 2008 04:49:17 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
Message-ID: <481AD58D.2010201@holdenweb.com>

Guido van Rossum wrote:
> I stand corrected on a few points. You've convinced me that ~/lib/ is
> wrong. But I still don't like ~/.local/; not in the last place because
> it's not any more local than any other dot files or directories. The
> "symmetry" with /usr/local/ is pretty weak, and certainly won't help
> beginning users.
> 
So it's the *name* you don't like rather than the invisibility?

> As a compromise, I'm okay with ~/Python/. I would like to be able to
> say that the user explicitly has to set an environment variable in
> order to benefit from this feature, just like with $PYTHONPATH and
> $PYTHONSTARTUP. But that might defeat the point of making this easy to
> use for noobs.
> 
Groan. Then everyone else realizes what a "great idea" this is, and we 
see ~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a 
directory with a comment market in its name), ~/Lisp/ and the rest? I 
don't think people would thank us for that in the long term.

I'm about +10 on invisibility, for the simple reason that "hiding the 
mechanism" is the right thing to do for naive users, who are the most 
likely to screw things up if given the chance and the most likely to be 
unaware of dot-name directories. If you don't like ~/.local/ then please 
consider ~/.private/ or ~/.personal/ or something else, but don't 
gratuitously add a visible subdirectory.

> On OS X I think we should put this somewhere under ~/Library/. Just
> put it in a different place than where the Python framework puts its
> stuff.
> 
Nothing to say about OS X.

One day Windows might start to respect the "hidden dot" convention, but 
perhaps in the interim we could create a (Windows-hidden) ~/.private/? 
Assuming we could work out where to put it ;-)

> On Thu, May 1, 2008 at 8:25 PM,  <glyph at divmod.com> wrote:
[much good sense]

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/

From glyph at divmod.com  Fri May  2 20:34:35 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 02 May 2008 18:34:35 -0000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
Message-ID: <20080502183435.25821.905798949.divmod.xquotient.7501@joule.divmod.com>


On 05:53 pm, fdrake at acm.org wrote:
>On May 1, 2008, at 7:54 PM, Barry Warsaw wrote:
>>Interesting.  I'm of the opposite opinion.  I really don't want 
>>Python dictating to me what my home directory should look like (a  dot 
>>file doesn't count because so many tools conspire to hide it  from 
>>me).  I guess there's always $PYTHONUSERBASE, but I think I  will not 
>>be alone. ;)

>Using ~/.local/ for user-managed content doesn't seem right to me at 
>all, because it's hidden by default.

I don't understand your reason for saying this.  Terms like "user" and 
"manage" are somewhat vague.  What sort of experience are you hoping to 
provide what sort of user with this convention?  I hope my earlier 
explanations were clear as far as the types of users.

I believe that the management of ~/.local/ is a subtle question.  It 
will largely be "managed" by simply telling distutils to put files 
there; I hope, implicitly.  In my mind there are 2 types of users who 
will be "managing" it - newbies, who don't really know what's going on 
but want "cd mypackage-0.0.1; python setup.py install; python -c 'import 
mypackage'" (or perhaps even "easy_install mypackage") to work, and 
advanced users who want to be able to mix-and-match different versions 
of different packages.  Advanced users might already have a PYTHONPATH 
management (virtual python, virtualenv, combinator, ~/.bashrc hacks, a 
directory full of symlinks) that already works for them, or be 
comfortable with inspecting a hidden directory, so ~/.local isn't a 
problem for them (i.e. us); newbies don't want to see the directory 
until they already know what's going on.
>I'd be even happier if there were no default per-user location, but a 
>required configuration setting (in the existing distutils config 
>locations) in order to enable per-user installation.

If you're happier without this feature, then perhaps your tastes run 
counter to a useful implementation of it :).  Why wouldn't you want it, 
though?  PYTHONPATH still exists; you don't have to use it, personally.

From andy at hexten.net  Fri May  2 21:39:47 2008
From: andy at hexten.net (Andy Armstrong)
Date: Fri, 2 May 2008 20:39:47 +0100
Subject: [Python-3000] Invitation to try out open source code review tool
Message-ID: <8DDD3FFD-BFF8-43EE-9F02-CA4013A3037C@hexten.net>

Hi Guido,

I'm afraid I've added a Perl based project (Test::Harness). I then  
went back and read your post and got to the bit where you specifically  
invited *Python* developers. Sorry about that. I'm not trying to  
colonise Pythonspace with Perl, honest :)

-- 
Andy Armstrong, Hexten


From glyph at divmod.com  Sun May  4 15:58:03 2008
From: glyph at divmod.com (glyph at divmod.com)
Date: Sun, 04 May 2008 13:58:03 -0000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
	<18460.20940.882777.235301@montanaro-dyndns-org.local>
	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
Message-ID: <20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>


On 3 May, 11:34 pm, fdrake at acm.org wrote:
>On May 3, 2008, at 7:51 AM, skip at pobox.com wrote:
>>Fred asked for a --prefix flag (which is what I was voting on).  I 
>>don't
>>really care what you do by default as long as you give me a way to  do 
>>it
>>differently.
>
>What's most interesting (to me) is that no one's commented on my note 
>that my preferred approach would be that there's no default at all; 
>the location would have to be specified explicitly.  Whether on the 
>command line or in the distutils configuration doesn't matter, but 
>explicitness should be required.

I thought I responded to it in my initial response, but let me be 
clearer.

First, Skip, I *only* care about the default behavior.  There's already 
a way to do it differently: PYTHONPATH.  So, Fred, I think what you're 
arguing for is to drop this feature entirely.  Or is there some other 
use for a new way to allow users to explicitly add something to 
sys.path, aside from PYTHONPATH?  It seems that it would add more 
complexity and I can't see what the value would be.

As I've said a dozen times in this thread already, the feature I'd like 
to get from a per-user installation location is that 'setup.py install', 
or at least some completely canonical distutils incantation, should 
work, by default, for non-root users; ideally non-administrators on 
windows as well as non-root users on unixish platforms.

From mal at egenix.com  Sun May  4 18:31:09 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Sun, 04 May 2008 18:31:09 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <481DE0CE.8010306@cheimes.de>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
	<481DE0CE.8010306@cheimes.de>
Message-ID: <481DE4CD.7070401@egenix.com>

On 2008-05-04 18:14, Christian Heimes wrote:
>> First, Skip, I *only* care about the default behavior.  There's already
>> a way to do it differently: PYTHONPATH.  So, Fred, I think what you're
>> arguing for is to drop this feature entirely.  Or is there some other
>> use for a new way to allow users to explicitly add something to
>> sys.path, aside from PYTHONPATH?  It seems that it would add more
>> complexity and I can't see what the value would be.
> 
> PYTHONPATH is lacking one feature which is important for lots of
> packages and setuptools. The directories in PYTHONPATH are just added to
> sys.path. But setuptools require a site package directory. Maybe a new
> env var PYTHONSITEPATH could solve the problem.

We don't need another setup variable for this. Just place a
well-known module into the site-packages/ directory and then
query it's __file__ attribute, e.g.

site-packages/site_packages.py

The module could even include a few helpers to query various
settings which apply to the site packages directory, e.g.

site_packages.get_dir()
site_packages.list_packages()
site_packages.list_modules()
etc.

>> As I've said a dozen times in this thread already, the feature I'd like
>> to get from a per-user installation location is that 'setup.py install',
>> or at least some completely canonical distutils incantation, should
>> work, by default, for non-root users; ideally non-administrators on
>> windows as well as non-root users on unixish platforms.
> 
> The implementation of my PEP provides a new option for install:
> 
> $ python setup.py install --user
> 
> Is it sufficient for you?

Just in case you don't know...

python setup.py install --home=~

will install to ~/lib/python

The problem is not getting the packages installed in a non-admin
location. It's about Python looking in a non-admin location per
default (as well as in the site-packages location).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 04 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Sun May  4 22:56:34 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Sun, 04 May 2008 22:56:34 +0200
Subject: [Python-3000] Reminder: last alphas next Wednesday 07-May-2008
In-Reply-To: <481E1526.6000903@cheimes.de>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>	<18459.43976.85481.758104@montanaro-dyndns-org.local>	<481C2AE7.9010805@gmail.com>	<18460.20940.882777.235301@montanaro-dyndns-org.local>	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>	<481DE0CE.8010306@cheimes.de>	<481DE4CD.7070401@egenix.com>
	<481E1526.6000903@cheimes.de>
Message-ID: <481E2302.8000509@egenix.com>

On 2008-05-04 21:57, Christian Heimes wrote:
> M.-A. Lemburg schrieb:
>>> PYTHONPATH is lacking one feature which is important for lots of
>>> packages and setuptools. The directories in PYTHONPATH are just added to
>>> sys.path. But setuptools require a site package directory. Maybe a new
>>> env var PYTHONSITEPATH could solve the problem.
>> We don't need another setup variable for this. Just place a
>> well-known module into the site-packages/ directory and then
>> query it's __file__ attribute, e.g.
>>
>> site-packages/site_packages.py
>>
>> The module could even include a few helpers to query various
>> settings which apply to the site packages directory, e.g.
>>
>> site_packages.get_dir()
>> site_packages.list_packages()
>> site_packages.list_modules()
>> etc.
> 
> I don't see how it is going to solve the use case "Add another site
> package directory when I don't have write access to the global site
> package directory and I don't want to modify my apps."

No, but it's going to solve the issue "which of the sys.path directories
is to be considered the site packages" directory. I was under the
impression that this is what you were after.

>> Just in case you don't know...
>>
>> python setup.py install --home=~
>>
>> will install to ~/lib/python
>>
>> The problem is not getting the packages installed in a non-admin
>> location. It's about Python looking in a non-admin location per
>> default (as well as in the site-packages location).
> 
> I know the --home option. For one the --home option is Unix only and not
> supported on Windows Also the --user option takes all options of my PEP
> 370 user site directory into account, includinge the PYTHONUSERBASE env var.

Ok. Just wanted to mention that there is a precedent in distutils
for doing user home directory installations.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 04 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From asmodai at in-nomine.org  Fri May  2 11:16:33 2008
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Fri, 2 May 2008 11:16:33 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <481AD771.6040802@cheimes.de>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<481AD58D.2010201@holdenweb.com> <481AD771.6040802@cheimes.de>
Message-ID: <20080502091633.GV78165@nexus.in-nomine.org>

-On [20080502 11:00], Christian Heimes (lists at cheimes.de) wrote:
>Windows and Mac OS X have dedicated directories for application specific
>libraries. That is ~/Library on Mac and Application Data on Windows. The
>latter is i18n-ed and called "Anwendungsdaten" in German. Fortunately
>Windows sets an environment var to the application data directory.

And Vista has C:\ProgramData\{vendor}\{application}, which is *not*
$APPDATA, but $ProgramData. $APPDATA points to
C:\Users\{user}\AppData\Roaming on Vista -- which is very different.

"Windows uses the Roaming folder for application specific data, such as
custom dictionaries, which are machine independent and should roam with the
user profile. The AppData\Roaming folder in Windows Vista is the same as the
Documents and Settings\username\Application Data folder in Windows XP."

I think that's different from what you meant above though, since I doubt
you'd want this (the libraries) to roam with the user.

See
http://download.microsoft.com/download/3/b/a/3ba6d659-6e39-4cd7-b3a2-9c96482f5353/Managing%20Roaming%20User%20Data%20Deployment%20Guide.doc
for more background.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Seek not death in the error of your life: and pull not upon yourselves
destruction with the works of your hands...

From asmodai at in-nomine.org  Fri May  2 11:20:08 2008
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Fri, 2 May 2008 11:20:08 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <481AD58D.2010201@holdenweb.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<481AD58D.2010201@holdenweb.com>
Message-ID: <20080502092008.GW78165@nexus.in-nomine.org>

-On [20080502 10:50], Steve Holden (steve at holdenweb.com) wrote:
>Groan. Then everyone else realizes what a "great idea" this is, and we see 
>~/Perl/, ~/Ruby/, ~/C# (that'll screw the Microsoft users, a directory with 
>a comment market in its name), ~/Lisp/ and the rest? I don't think people 
>would thank us for that in the long term.

I'm +1 on just using $HOME/.local, but otherwise $HOME/.python makes sense
too. $HOME/.python.d doesn't do it for me, too clunky (and hardly used if I
look at my .files in $HOME).

But I agree with Steve that it should be a hidden directory.

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
Cum angelis et pueris, fideles inveniamur. Quis est iste Rex gloriae..?

From asmodai at in-nomine.org  Fri May  2 15:11:31 2008
From: asmodai at in-nomine.org (Jeroen Ruigrok van der Werven)
Date: Fri, 2 May 2008 15:11:31 +0200
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <481B0DE6.30406@lemurconsulting.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<ca471dc20805011855j691b64f2j6807dee7ae1ee489@mail.gmail.com>
	<20080502032549.25821.1219840827.divmod.xquotient.7352@joule.divmod.com>
	<ca471dc20805012049q9c67019r91376224500f7845@mail.gmail.com>
	<481AD58D.2010201@holdenweb.com>
	<20080502092008.GW78165@nexus.in-nomine.org>
	<481B0DE6.30406@lemurconsulting.com>
Message-ID: <20080502131131.GD78165@nexus.in-nomine.org>

-On [20080502 14:49], Richard Boulton (richard at lemurconsulting.com) wrote:
>So, on Ubuntu computers at least, it seems likely that a $HOME/.local/
>directory will already exist, with the beginnings of a unix style layout
>inside it.

On my Ubuntu 8 box:

[15:11] [ruigrok at akuma] (0) {0} % ls ~/.local   
share

-- 
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
????? ?????? ??? ?? ??????
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
The only source of knowledge is experience...

From ncoghlan at gmail.com  Tue May  6 12:41:32 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 06 May 2008 20:41:32 +1000
Subject: [Python-3000] PEP 3138 - String representation in Python 3000
In-Reply-To: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
Message-ID: <482035DC.5060906@gmail.com>

Atsuo Ishimoto wrote:
> I've written a PEP for new string representation in Python 3000.

+1 from me - with this PEP in place getting the old repr() behaviour 
back is fairly straightforward (as shown in the PEP), but it's hard to 
get the unicode-friendly repr() behaviour any other way (because the 
current repr() loses too much information, as demonstrated fairly 
thoroughly in the python-dev thread that inspired the PEP).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From barry at python.org  Tue May  6 13:26:20 2008
From: barry at python.org (Barry Warsaw)
Date: Tue, 6 May 2008 07:26:20 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next
	Wednesday	07-May-2008
In-Reply-To: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
Message-ID: <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 1, 2008, at 8:03 PM, glyph at divmod.com wrote:
>
> Am I the only guy who finds software that insists on visible, fixed  
> files in my home directory rude?  vmware, for example, wants a "~/ 
> vmware" directory, but pretty much every other application I use is  
> nice enough to use dotfiles (even cedega, with a roughly-comparable- 
> to- lib "applications I've installed for you" folder).

No Glyph, you are not alone!  I don't even like the OS putting stuff  
like Pictures, Music, Movies, Videos and Desktop in my home directory,  
but I guess that's the price we pay for a modrin desktopy operatin'  
systum.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCBAXHEjvBPtnXfVAQJdrgP+Mw0qZebL+MqUk3wKMsRt5mHzT/uHhQ0Z
NVwyooWKWnvLMMifCbaG3pjVs7MehfcbAK8uLTlF8Ss9/w1Q5SWJkdhLMWOvHdA6
CJMvGyuokElD5e2cKXiakUWUshN/CeGNElTpxHUBdwmkirfXLQzQll9jlYbnr0I8
du2+rTj/oAc=
=015L
-----END PGP SIGNATURE-----

From rasky at develer.com  Tue May  6 13:41:27 2008
From: rasky at develer.com (Giovanni Bajo)
Date: Tue, 6 May 2008 11:41:27 +0000 (UTC)
Subject: [Python-3000] Removal of os.path.walk
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<ca471dc20804292042l344ddd9cs5f16296e63793a30@mail.gmail.com>
	<20080430144804.GA26439@panix.com> <4818F7C3.7060806@v.loewis.de>
	<ca471dc20804301602n79f6dd04r957e2b6817a5d7b1@mail.gmail.com>
	<481955B5.2030805@v.loewis.de> <48199954.4000800@gmail.com>
	<ca471dc20805010858h417c2d7dra0caf524cbba77da@mail.gmail.com>
Message-ID: <fvpg57$qfr$1@ger.gmane.org>

On Thu, 01 May 2008 08:58:22 -0700, Guido van Rossum wrote:

> On Thu, May 1, 2008 at 3:20 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>>  I think Giovanni's point is an important one as well - with an
>>  iterator,
>> you can pipeline your operations far more efficiently, since you don't
>> have to wait for the whole directory listing before doing anything
>> (e.g. if you're doing some kind of move/rename operation on a
>> directory, you can start copying the first file to its new location
>> without having to wait for the directory read to finish).
>>
>>  Reducing the startup delays of an operation can be a very useful thing
>>  when
>> it comes to providing a user with a good feeling of responsiveness from
>> an application (and if it allows the application to more effectively
>> pipeline something, there may be an actual genuine improvement in
>> responsiveness, rather than just the appearance of one).
> 
> This sounds like optimizing for a super-rare case. And please do tell me
> if you've timed this.

I do, it's easy. I have several Maildir directories with tens thousands 
of messages who take 10-15 seconds to be listed through NFS (starting 
from a ext3 file system). On the contrary, commands like "grep -r 
"whatever" ." start displaying output immediately.

Without something like opendir(), it's basically making impossible to 
achieve this in Python.
-- 
Giovanni Bajo
Develer S.r.l.
http://www.develer.com


From mal at egenix.com  Tue May  6 13:45:53 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 06 May 2008 13:45:53 +0200
Subject: [Python-3000] PEP 3108 - String representation in Python 3000
In-Reply-To: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
Message-ID: <482044F1.8020100@egenix.com>

On 2008-05-06 02:56, Atsuo Ishimoto wrote:
> I've written a PEP for new string representation in Python 3000.
> 
> Patch is updated at http://bugs.python.org/issue2630, and Guido
> updated a patch to Rietveld:
> http://codereview.appspot.com/767 .
> 
> I would appreciate your comments and help.
 >...
> Specification
> =============
> 
> - The algorithm to build repr() strings should be changed to:
> 
>   * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.
> 
>   * Convert other non-printable ASCII characters(0x00-0x1f, 0x7f) to
>     '\\xXX'.
> 
>   * Convert leading surrogate pair characters without trailing character
>     (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.
> 
>   * Convert Unicode whitespace other than ASCII space('\\x20') and
>     control characters (categories Z* and C* in the Unicode database)
>     to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.
> 
> - Set the Unicode error-handler for sys.stdout and sys.stderr to
>   'backslashreplace' by default.

For sys.stderr it may make sense to override any error reporting
because of encoding problems. -0 on that.

For sys.stdout this doesn't make sense at all, since it hides encoding
errors for all applications using sys.stdout as piping mechanism.
-1 on that.

Both are really way beyond the scope of the PEP and I don't
really see the need for them. They also don't cover the cases
where you write the repr() to a log file, some stream or syslog.

I'd be +1 on making the error handling of sys.stdout and sys.stderr
user adjustable.

> Printable characters
> --------------------
> 
> The Unicode standard doesn't define Non-printable characters, so we must
> create our own definition. Here we propose to define Non-printable
> characters as follows.
> 
> - Non-printable ASCII characters as Python 2.
> 
> - Broken surrogate pair characters.
> 
> - Characters defined in the Unicode character database as
> 
>   * Cc (Other, Control)
>   * Cf (Other, Format)
>   * Cs (Other, Surrogate)
>   * Co (Other, Private Use)
>   * Cn (Other, Not Assigned)
>   * Zl Separator, Line ('\\u2028', LINE SEPARATOR)
>   * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR)
>   * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
>     this category should be escaped to avoid ambiguity.

This is all very nice, but if that means that the whole Unicode
database has to be loaded every time the interpreter starts up
as you indicated on the ticket, them I'm firmly -1 against that.

We've taken great care *not* to do this in Py2.x by moving
the database to a module that's imported only when needed.
It would be really silly to do this now, just to get some
Unicode repr() processed.

BTW, I'm sure it's possible to break down the above into a set of
ranges and switch cases that are easy to test without having to
lookup code points in the database. Even if you do end up using
the database, it should only be imported if the repr() really
does not need to lookup code points outside the Latin-1 range.

> Alternate Solutions
> -------------------
> 
> To help debugging in non-Latin languages without changing repr(), other
> suggestion were made.
> ...
> - Make the encoding used by unicode_repr() adjustable.
> 
>  There is no benefit preserving the current repr() behavior to make
>  application/library authors aware of non-ASCII repr(). And selecting
>  an encoding on printing is more flexible than having a global setting.

I'm not sure what you are saying here.

I proposed to make the Unicode repr() output a regular encoding
that's being implemented by a codec. You could then easily
change the encoding to whatever you need for your application
or console.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 06 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From alec at swapoff.org  Tue May  6 14:45:22 2008
From: alec at swapoff.org (Alec Thomas)
Date: Tue, 6 May 2008 22:45:22 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
Message-ID: <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>

2008/5/6 Barry Warsaw <barry at python.org>:
>  On May 1, 2008, at 8:03 PM, glyph at divmod.com wrote:
> > Am I the only guy who finds software that insists on visible, fixed files
> in my home directory rude?  vmware, for example, wants a "~/vmware"
> directory, but pretty much every other application I use is nice enough to
> use dotfiles (even cedega, with a roughly-comparable-to- lib "applications
> I've installed for you" folder).
>
>  No Glyph, you are not alone!  I don't even like the OS putting stuff like
> Pictures, Music, Movies, Videos and Desktop in my home directory, but I
> guess that's the price we pay for a modrin desktopy operatin' systum.

I too find this irritating.

FWIW my vote is for ~/.python. ~/.local comes in a distant second due
to non-obviousness and ~/Python is several light years beyond that.
-- 
Evolution: Taking care of those too stupid to take care of themselves.

From ishimoto at gembook.org  Tue May  6 14:53:08 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Tue, 6 May 2008 21:53:08 +0900
Subject: [Python-3000] PEP 3138 - String representation in Python 3000
In-Reply-To: <20080506060917.GA29253@phd.pp.ru>
References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
	<20080506060917.GA29253@phd.pp.ru>
Message-ID: <797440730805060553s449b863dvb7536244d1d6f252@mail.gmail.com>

On Tue, May 6, 2008 at 3:09 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> Hello! Well done! Thank you!

Thank you! I updated the Wiki http://wiki.python.org/moin/Python3kStringRepr
as per your suggestions.

>    Not only to log files. HTTP daemons, e.g., run with one locale but
>  answer to all kinds of clients.
>

I thought it is not good idea to use repr() to render HTML, but I had
to remember the cgitb module.

Thank you for your help!

From jeremy at alum.mit.edu  Tue May  6 15:24:24 2008
From: jeremy at alum.mit.edu (Jeremy Hylton)
Date: Tue, 6 May 2008 09:24:24 -0400
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <bbaeab100805052322l6a4aad62wb383e3c610ad3a44@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<ca471dc20805051533g2943483cyf45050d80f60508c@mail.gmail.com>
	<bbaeab100805051620w19b8cf65see16cb4ebd213bc1@mail.gmail.com>
	<1afaf6160805051703y2e3ee58fv5ecddeaaf29dc0c5@mail.gmail.com>
	<bbaeab100805052322l6a4aad62wb383e3c610ad3a44@mail.gmail.com>
Message-ID: <e8bf7a530805060624w2c2d3a46o66e5e46d9d116ba0@mail.gmail.com>

If we want to grab a particular restructuring task, is there a way to
record that we're working on it?

Jeremy

On Tue, May 6, 2008 at 2:22 AM, Brett Cannon <brett at python.org> wrote:
> On Mon, May 5, 2008 at 5:03 PM, Benjamin Peterson
>  <musiccomposition at gmail.com> wrote:
>  > On Mon, May 5, 2008 at 6:20 PM, Brett Cannon <brett at python.org> wrote:
>  >  > On Mon, May 5, 2008 at 3:33 PM, Guido van Rossum <guido at python.org> wrote:
>  >  >  >
>  >
>  > >  >  I've accepted this PEP.
>  >  >
>  >  >  Woohoo!
>  >
>  >  Congrats!
>  >
>  >
>  >  >
>  >  >  > Everyone, get to work on implementing this!
>  >  >  >  I'm sure some small nits will come up during the work that nobody
>  >  >  >  anticipated during the PEP discussion. In that case, let's be flexible
>  >  >  >  and work to update the PEP with the best possible solution.
>  >  >
>  >  >  And use the PEP to keep track of what state everything is in!
>  >  >  Hopefully I will start work on this tonight or tomorrow.
>  >
>  >  What can I/we do to help?
>
>  Once I have worked out exactly needs to be done for each possible
>  thing (deletion, rename), then going through the motions in terms of
>  just doing the right thing for 2.6/3.0. I have an idea on how I want
>  to test the deletion warnings. Once I have that in place then it
>  should be a matter of adding the tests to test_py3kwarn, the warning
>  in the module, and the proper note in the docs.
>
>  -Brett
>
>
> _______________________________________________
>  Python-3000 mailing list
>  Python-3000 at python.org
>  http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe: http://mail.python.org/mailman/options/python-3000/jeremy%40alum.mit.edu
>

From ncoghlan at gmail.com  Tue May  6 15:51:32 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 06 May 2008 23:51:32 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
Message-ID: <48206264.1010507@gmail.com>

Alec Thomas wrote:
> FWIW my vote is for ~/.python. ~/.local comes in a distant second due
> to non-obviousness and ~/Python is several light years beyond that.

I think if the obviousness (or lack thereof) of the chosen directory 
name ever really matters to anyone, we did it wrong. After all, unless 
you're trying to use something other than distutils to get a package 
ready for installation, how often does it really matter that the 
site-packages directory for an installed python interpreter actually 
lives somewhere inside /usr/local?

The main advantage I see to using the "~/.local" approach is that a lot 
of questions about file layout (e.g. where to put architecture specific 
code) are automatically (and fairly obviously) answered "Do whatever is 
done for the system-wide equivalent in /usr/local".

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ishimoto at gembook.org  Tue May  6 15:55:48 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Tue, 6 May 2008 22:55:48 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
Message-ID: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>

(I changed subject)

Thank you for your comment.

On Tue, May 6, 2008 at 8:45 PM, M.-A. Lemburg <mal at egenix.com> wrote:

>  For sys.stdout this doesn't make sense at all, since it hides encoding
>  errors for all applications using sys.stdout as piping mechanism.
>  -1 on that.

You can raise UnicodeEncodigError for encoding errors if you want, by
setting sys.stdout's error-handler to `strict`.

>
>  Both are really way beyond the scope of the PEP and I don't
>  really see the need for them.

Even though this PEP was rejected, I'll still propose to change
default error-handler for sys.stdout and for sys.stderr to
'backslashreplace'. For Python 2, 'strict' error-handler is acceptable
because most of text data are 8-bit string, but for Py3K, raising
exceptions when the printed text contains a character not supported by
console is annoying.

>  They also don't cover the cases
>  where you write the repr() to a log file, some stream or syslog.

Sure. I missed some cases, such as cgitb module or logging module.
I'll investigate them later. If you have another candidate, please let
me know.

> > - Characters defined in the Unicode character database as
[snip]
>
>  This is all very nice, but if that means that the whole Unicode
>  database has to be loaded every time the interpreter starts up
>  as you indicated on the ticket, them I'm firmly -1 against that.

I changed a patch to add a flag to the _PyUnicode_TypeRecords table,
so the Unicode database is not loaded at stat up.

>
>  I proposed to make the Unicode repr() output a regular encoding
>  that's being implemented by a codec. You could then easily
>  change the encoding to whatever you need for your application
>  or console.

I think global setting is not flexible enough. And I see no benefit to
customizable repr() except to keep compatible with Python 2, but I
think it is easy to migrate the existing code to the Py3k.

From ncoghlan at gmail.com  Tue May  6 16:10:43 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 07 May 2008 00:10:43 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
Message-ID: <482066E3.7030209@gmail.com>

Atsuo Ishimoto wrote:
>>  I proposed to make the Unicode repr() output a regular encoding
>>  that's being implemented by a codec. You could then easily
>>  change the encoding to whatever you need for your application
>>  or console.
> 
> I think global setting is not flexible enough. And I see no benefit to
> customizable repr() except to keep compatible with Python 2, but I
> think it is easy to migrate the existing code to the Py3k.

There's a bigger issue with trying to make whatever repr() does a codec 
in Py3k. As a Unicode->Unicode transformation, it doesn't mesh well with 
Py3k's strict Unicode->bytes/bytes->Unicode encoding/decoding philosophy.

That said, it would be nice to have a way to easily stack 
Unicode->Unicode transforms on top of text IO streams, or byte->byte 
transforms on top of binary streams.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From skip at pobox.com  Tue May  6 16:21:35 2008
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 6 May 2008 09:21:35 -0500
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
Message-ID: <18464.26991.824584.173058@montanaro-dyndns-org.local>


    Alec> FWIW my vote is for ~/.python. ~/.local comes in a distant second
    Alec> due to non-obviousness and ~/Python is several light years beyond
    Alec> that.

I guess we're going to have to agree to disagree.  I find hiding directories
which contain executable code extremely non-obvious.  Would you prefer
/usr/.local to /usr/local?  If not, then why prefer ~/.local to ~/local?

Skip

From steve at holdenweb.com  Tue May  6 16:37:04 2008
From: steve at holdenweb.com (Steve Holden)
Date: Tue, 06 May 2008 10:37:04 -0400
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <18464.26991.824584.173058@montanaro-dyndns-org.local>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>	<481A35D8.60604@cheimes.de>	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
	<18464.26991.824584.173058@montanaro-dyndns-org.local>
Message-ID: <fvpqfk$1sk$1@ger.gmane.org>

skip at pobox.com wrote:
>     Alec> FWIW my vote is for ~/.python. ~/.local comes in a distant second
>     Alec> due to non-obviousness and ~/Python is several light years beyond
>     Alec> that.
> 
> I guess we're going to have to agree to disagree.  I find hiding directories
> which contain executable code extremely non-obvious.  Would you prefer
> /usr/.local to /usr/local?  If not, then why prefer ~/.local to ~/local?
> 
Not wanting to speak for Alec, but in my opinion the answer is mostly 
because /usr/local doesn't impinge on a home directory listing, so I 
don't care that it's visible. Naive users don't go looking around the 
filestore any more than they poke around in their hidden subdirectories.

If you want it visible, make a visible symbolic link!

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From alec at swapoff.org  Tue May  6 17:11:57 2008
From: alec at swapoff.org (Alec Thomas)
Date: Wed, 7 May 2008 01:11:57 +1000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <18464.26991.824584.173058@montanaro-dyndns-org.local>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
	<18464.26991.824584.173058@montanaro-dyndns-org.local>
Message-ID: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com>

2008/5/7  <skip at pobox.com>:
>
>     Alec> FWIW my vote is for ~/.python. ~/.local comes in a distant second
>     Alec> due to non-obviousness and ~/Python is several light years beyond
>     Alec> that.
>
>  I guess we're going to have to agree to disagree.  I find hiding directories
>  which contain executable code extremely non-obvious.  Would you prefer

Python would not be unique. Mozilla/Firefox does exactly this, putting
per-user plugins in ~/.mozilla.

>  /usr/.local to /usr/local?  If not, then why prefer ~/.local to ~/local?

Because unlike a home directory, users don't frequently perform
directory listings or tab completion of /usr/. For a frequently used
personal directory one wants the minimum of noise.

Also:

  1. If every application followed the convention of creating
non-hidden paths in home directories the directory listing would be
*incredibly* noisy. To illustrate, I have 160 dotfiles, most of which
were created by applications. I have only 8 non-hidden directories,
all of which I have created myself.
  2. Non-hidden directories interfere with tab completion muscle memory.
  3. On a more subjective note, home directories are personal space.
People shape them to their personality, and interfering with this is
impolite.
  4. Per-application dotfiles have 30 years of convention behind them.
Conversely, only a few applications use ~/.local (for example, Openbox
and Audacious both look for configuration here) and none that I'm
aware of default to ~/local.
  5. Applications that create non-hidden directories in user home
directories are generally perceived as being obnoxious.

-- 
Evolution: Taking care of those too stupid to take care of themselves.

From janssen at parc.com  Tue May  6 17:30:35 2008
From: janssen at parc.com (Bill Janssen)
Date: Tue, 6 May 2008 08:30:35 PDT
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com> 
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
	<18464.26991.824584.173058@montanaro-dyndns-org.local>
	<5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com>
Message-ID: <08May6.083045pdt."58696"@synergy1.parc.xerox.com>

> >  /usr/.local to /usr/local?  If not, then why prefer ~/.local to ~/local?
> 
> Because unlike a home directory, users don't frequently perform
> directory listings or tab completion of /usr/. For a frequently used
> personal directory one wants the minimum of noise.

Glad someone around here knows actual facts about the statistics of
using "ls" :-).  Can you point to published user studies about this?
If not, let me just say that I perform directory listings of /usr a
whole lot *more* than my home directory.

Um, isn't this all argument about what color to paint the shed?

Bill

From collinw at gmail.com  Tue May  6 18:07:00 2008
From: collinw at gmail.com (Collin Winter)
Date: Tue, 6 May 2008 09:07:00 -0700
Subject: [Python-3000] PEP 3108 - stdlib reorg/cleanup
In-Reply-To: <bbaeab100805011102l4d45750ewe605962ade512d33@mail.gmail.com>
References: <bbaeab100804281930h598e5889k88fdb322dd5573a2@mail.gmail.com>
	<43aa6ff70805010741m1a39e740ic5e42e2c74109b03@mail.gmail.com>
	<bbaeab100805011102l4d45750ewe605962ade512d33@mail.gmail.com>
Message-ID: <43aa6ff70805060907m5568969dqcce8aa1127177b44@mail.gmail.com>

On Thu, May 1, 2008 at 11:02 AM, Brett Cannon <brett at python.org> wrote:
> On Thu, May 1, 2008 at 7:41 AM, Collin Winter <collinw at gmail.com> wrote:
>  >
>  > On Mon, Apr 28, 2008 at 7:30 PM, Brett Cannon <brett at python.org> wrote:
>
>
> >  >  Transition Plan
>  >  >  ===============
>  >  >
>  >  >  For modules to be removed
>  >  >  -------------------------
>  >  >
>  >  >  For the removal of modules that are continuing to exist in the Python
>  >  >  2.x series (i.e., not deprecated explicitly in the 2.x series),
>  >  >  ``warnings.warn3k()`` will be used to issue a DeprecationWarning.
>  >
>  >  FYI, we can also flag these using 2to3.
>  >
>
>  I can't remember if we have a guiding rule on this yet, but if 2to3
>  can fix this, do we still want the warning? Obviously both names will
>  be provided so people can move their code over, but perhaps the
>  warning is not needed?

I say keep the runtime warning. 2to3 can't fix the cases where the
module is being removed entirely; the best it can do is to flag the
import statement as requiring the user's attention.

>  >  >  Renaming of modules
>  >  >  -------------------
>  >  >
>  >  >  For modules that are renamed, stub modules will be created with the
>  >  >  original names and be kept in a directory within the stdlib (e.g. like
>  >  >  how lib-old was once used).  The need to keep the stub modules within
>  >  >  a directory is to prevent naming conflicts with case-insensitive
>  >  >  filesystems in those cases where nothing but the case of the module
>  >  >  is changing.
>  >  >
>  >  >  These stub modules will import the module code based on the new
>  >  >  naming.  The same type of warning being raised by modules being
>  >  >  removed will be raised in the stub modules.
>  >  >
>  >  >  Support in the 2to3 refactoring tool for renames will also be used
>  >  >  [#2to3]_.  Import statements will be rewritten so that only the import
>  >  >  statement and none of the rest of the code needs to be touched.  This
>  >  >  will be accomplished by using the ``as`` keyword in import statements
>  >  >  to bind in the module namespace to the old name while importing based
>  >  >  on the new name.
>  >
>  >  You should cite the existing fix_imports fixer as one example of how
>  >  to do this: http://svn.python.org/view/sandbox/trunk/2to3/lib2to3/fixes/fix_imports.py?view=markup
>
>  Done.
>
>  -Brett
>

From guido at python.org  Tue May  6 18:37:00 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 6 May 2008 09:37:00 -0700
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
	<18464.26991.824584.173058@montanaro-dyndns-org.local>
	<5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com>
Message-ID: <ca471dc20805060937q4cdd2da9qd39e0bf2a09e3358@mail.gmail.com>

On Tue, May 6, 2008 at 8:11 AM, Alec Thomas <alec at swapoff.org> wrote:
>  Python would not be unique. Mozilla/Firefox does exactly this, putting
>  per-user plugins in ~/.mozilla.

Note that this is moot since I'm going to accept the PEP as it stands
(i.e. ~/.local) but I want to point out something that seems to be
lost occasionally.

Hiding stuff in dot files is the right thing to do when there's a
separate API (like Mozilla) to manage those files. It is IMO much more
questionable when the user is expected to manage things directly using
the standard filesystem API. That's why Pictures etc. are not dot
files.

Of course, there's a gray area -- grizzled Unix wizards manage dozens
of dot files like .profile and .exrc -- but I still think this is a
useful (partial) guiding principle.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue May  6 18:42:20 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 6 May 2008 09:42:20 -0700
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
	07-May-2008
In-Reply-To: <fvpqfk$1sk$1@ger.gmane.org>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
	<18464.26991.824584.173058@montanaro-dyndns-org.local>
	<fvpqfk$1sk$1@ger.gmane.org>
Message-ID: <ca471dc20805060942t5a4f61cq9e5036717d78d2c5@mail.gmail.com>

On Tue, May 6, 2008 at 7:37 AM, Steve Holden <steve at holdenweb.com> wrote:
>  If you want it visible, make a visible symbolic link!

Note that the point is moot, since I'm going to accept Christian's
PEP, i.e. ~/.local, but this argument "you can make it visible
yourself" is bogus. The point of visibility (when it's brought up)
isn't that you *can* make it visible -- you can always do that with ls
-a or whatever Finder option. The point is that (in some people's
view) the results of an action should be left *in plain sight* so that
the user has clear evidence of what happened.

I'm fine in this case with the counterarguments though, so I'll be
accepting the PEP in a minute.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From skip at pobox.com  Tue May  6 18:48:34 2008
From: skip at pobox.com (skip at pobox.com)
Date: Tue, 6 May 2008 11:48:34 -0500
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
	<61E8E3D5-B66F-4CA0-B5CF-3A7BB3CE12BD@python.org>
	<5b52d53b0805060545s513ce7e4m52f07275e450ba9a@mail.gmail.com>
	<18464.26991.824584.173058@montanaro-dyndns-org.local>
	<5b52d53b0805060811h34f6cb29sb8539a490b7be3d0@mail.gmail.com>
Message-ID: <18464.35810.137067.640251@montanaro-dyndns-org.local>


    >> /usr/.local to /usr/local?  If not, then why prefer ~/.local to ~/local?

    Alec> Because unlike a home directory, users don't frequently perform
    Alec> directory listings or tab completion of /usr/. For a frequently
    Alec> used personal directory one wants the minimum of noise.

I don't mind the system clearly telling me about code I've installed.
That's a lot different than Mozilla hiding it's internal stuff in
~/.mozilla.

Skip


From guido at python.org  Tue May  6 19:03:59 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 6 May 2008 10:03:59 -0700
Subject: [Python-3000] PEP 370 (was Re: [Python-Dev] Reminder: last
	alphas next Wednesday 07-May-2008)
In-Reply-To: <481E15B7.9060003@cheimes.de>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<2B80BDE9-0CF4-4741-BF33-A6EC3BB0462A@python.org>
	<B3CAA68B-FDBA-45E6-A505-F0A23195C5E7@acm.org>
	<18459.43976.85481.758104@montanaro-dyndns-org.local>
	<481C2AE7.9010805@gmail.com>
	<18460.20940.882777.235301@montanaro-dyndns-org.local>
	<45D12407-959A-43C5-B9E4-3E1D9F2B6B29@acm.org>
	<20080504135803.25821.2140838939.divmod.xquotient.7545@joule.divmod.com>
	<481DC6A3.70104@gmail.com> <481E15B7.9060003@cheimes.de>
Message-ID: <ca471dc20805061003t19bc88e0g7c2b420fdd6d9d35@mail.gmail.com>

All,

I've accepted PEP 370, Christian Heimes's proposal to add a per-user
site-package directory. The location will be somewhere under ~/.local
for Unix/Linux/OS X, and %APPDATA%/Python for Windows (per the
original proposal in the PEP).

Congratulations Christian, and thanks for championing this.

Thanks also to everyone who contributed to the discussion and showed
the error of my ways -- especially those who did so in under 100
words. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue May  6 19:19:08 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 6 May 2008 10:19:08 -0700
Subject: [Python-3000] Invitation to try out open source code review tool
In-Reply-To: <8DDD3FFD-BFF8-43EE-9F02-CA4013A3037C@hexten.net>
References: <8DDD3FFD-BFF8-43EE-9F02-CA4013A3037C@hexten.net>
Message-ID: <ca471dc20805061019i29f6268m14dee0811e901cb6@mail.gmail.com>

On Fri, May 2, 2008 at 12:39 PM, Andy Armstrong <andy at hexten.net> wrote:
> Hi Guido,
>
>  I'm afraid I've added a Perl based project (Test::Harness). I then went
> back and read your post and got to the bit where you specifically invited
> *Python* developers. Sorry about that. I'm not trying to colonise
> Pythonspace with Perl, honest :)

No problem! I didn't mean to be exclusive. You're more than welcome to
use Rietveld. We'll be making an announcement later today that opens
it up for everyone anyway, and any experiences you have to share are
welcome on the codereview-discuss Google group.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Tue May  6 20:47:49 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Wed, 07 May 2008 03:47:49 +0900
Subject: [Python-3000] PEP 3108 - String representation in Python 3000
In-Reply-To: <482044F1.8020100@egenix.com>
References: <797440730805051756t16bda524mc3cb01cb010c7355@mail.gmail.com>
	<482044F1.8020100@egenix.com>
Message-ID: <874p9b9smi.fsf@uwakimon.sk.tsukuba.ac.jp>

M.-A. Lemburg writes:
 > This is all very nice, but if that means that the whole Unicode
 > database has to be loaded every time the interpreter starts up

Ouch.

 > BTW, I'm sure it's possible to break down the above into a set of
 > ranges and switch cases that are easy to test without having to
 > lookup code points in the database. Even if you do end up using
 > the database, it should only be imported if the repr() really
 > does not need to lookup code points outside the Latin-1 range.

You mean, "really does need to look up", right?

Would it be too disgusting to have a simple range-based repr() as a
builtin, and replace it with a lookup-based repr() defined in the
Unicode database?


From barry at python.org  Wed May  7 00:43:05 2008
From: barry at python.org (Barry Warsaw)
Date: Tue, 6 May 2008 18:43:05 -0400
Subject: [Python-3000] [Python-Dev] PEP 370 heads up
In-Reply-To: <4820DDD8.2040600@cheimes.de>
References: <4820DDD8.2040600@cheimes.de>
Message-ID: <8F920497-9672-4AE4-9DD2-FC76F98EC568@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 6, 2008, at 6:38 PM, Christian Heimes wrote:

> Guido has accepted my user site directory PEP today.
> http://python.org/dev/peps/pep-0370/
>
> I'm about the merge the code. But first I like to let you know some
> things and get your opinion.

Very awesome Christian!  I'm psyched for this to get into the last  
alpha releases, which I remind everyone happens tomorrow.  Plan on svn  
tree freeze at approximately 6pm EDT (2200 UTC).

Cheers,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCDe+nEjvBPtnXfVAQJJMAP/XiZvXPptw8tZ4/01hD7r39/lWgoDUmjp
gVzne4+XMfz8NcLQMP2+Y38cPrQziyG8BYDqN/vWT641bOwv20QHuZYFvI9Kr09q
jTEC39DzNRfD6ThzD/na6M1M7glpXiWr3hj4Va56JEnn1ekj6Ejb7BoW1oyuyz6T
gUuAgVT2lOw=
=2IIq
-----END PGP SIGNATURE-----

From lists at cheimes.de  Wed May  7 00:50:37 2008
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 07 May 2008 00:50:37 +0200
Subject: [Python-3000] [Python-Dev] PEP 370 heads up
In-Reply-To: <8F920497-9672-4AE4-9DD2-FC76F98EC568@python.org>
References: <4820DDD8.2040600@cheimes.de>
	<8F920497-9672-4AE4-9DD2-FC76F98EC568@python.org>
Message-ID: <4820E0BD.9040405@cheimes.de>

Barry Warsaw schrieb:
> Very awesome Christian!  I'm psyched for this to get into the last alpha
> releases, which I remind everyone happens tomorrow.  Plan on svn tree
> freeze at approximately 6pm EDT (2200 UTC).

Thanks Barry! Also thanks to Glyph, Nick and all the other people that
stepped in during the discussion in favor of ~/.local!

Christian

PS: I'll try to get json into shape for Python 3.0. It's going to be
tricky for various reasons For example the re module still doesn't
support bytes.

From barry at python.org  Thu May  8 01:24:32 2008
From: barry at python.org (Barry Warsaw)
Date: Wed, 7 May 2008 19:24:32 -0400
Subject: [Python-3000] Releasing alphas tonight
Message-ID: <E7F4F775-76F9-4A1F-9910-7787D645FCC3@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,

Just a reminder that I'm going to be cutting the releases tonight.   
Because of work, I didn't make the 6pm EDT goal, and now I have to run  
out for a few hours.  I will send another message when I'm ready to  
start spinning the release, but figure it will be at about 10pm EDT.   
Please limit your commits between now and then to only things you  
absolutely know will improve stability and test passing.

Thanks,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCI6MXEjvBPtnXfVAQIfgwP+P7XOTMwWex5+YwiOza0fEeUj5n8OJuxU
ISK3p3Tas4tPM65eMCHk5vmIFOBfJDFyWBpNhGr+uKmaWMgiqtPX5fs6nMmkbkrY
dWrfG5Mgth9U1hpR4/1y/p2W82DJX9exmnjYL2BxjZ/TGeZdbcpUcs6Cc/fpHKR/
wTQ3dagAPNA=
=bDtn
-----END PGP SIGNATURE-----

From lists at cheimes.de  Thu May  8 02:51:43 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 08 May 2008 02:51:43 +0200
Subject: [Python-3000] Releasing alphas tonight
In-Reply-To: <E7F4F775-76F9-4A1F-9910-7787D645FCC3@python.org>
References: <E7F4F775-76F9-4A1F-9910-7787D645FCC3@python.org>
Message-ID: <48224E9F.40407@cheimes.de>

Barry Warsaw schrieb:
> Hi all,
> 
> Just a reminder that I'm going to be cutting the releases tonight.
> Because of work, I didn't make the 6pm EDT goal, and now I have to run
> out for a few hours.  I will send another message when I'm ready to
> start spinning the release, but figure it will be at about 10pm EDT.
> Please limit your commits between now and then to only things you
> absolutely know will improve stability and test passing.

The py3k branch has a major show stopper, It's leaking references to the
max.

...
test_builtin leaked [14, 14, 14, 14] references, sum=56
test_exceptions
beginning 9 repetitions
123456789
.........
test_exceptions leaked [40, 40, 40, 40] references, sum=160
test_types
beginning 9 repetitions
123456789
.........
test_types leaked [2, 2, 2, 2] references, sum=8
test_unittest
beginning 9 repetitions
123456789
.........
test_unittest leaked [23, 23, 23, 23] references, sum=92
...

I'm trying to find the issue.

Christian

From lists at cheimes.de  Thu May  8 03:25:01 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 08 May 2008 03:25:01 +0200
Subject: [Python-3000] Releasing alphas tonight
In-Reply-To: <48224E9F.40407@cheimes.de>
References: <E7F4F775-76F9-4A1F-9910-7787D645FCC3@python.org>
	<48224E9F.40407@cheimes.de>
Message-ID: <4822566D.7080207@cheimes.de>

Christian Heimes schrieb:
> The py3k branch has a major show stopper, It's leaking references to the
> max.

Fixed ;)

Christian

From musiccomposition at gmail.com  Thu May  8 04:21:13 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Wed, 7 May 2008 21:21:13 -0500
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
Message-ID: <1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com>

Can I go ahead and remove this then?

>
>   > It seems that os.walk has more options and a cleaner interface to
>   > walking trees than os.path.walk does. Is there support for the removal
>   > this in Py3k?
>   >
>   > --
>   > Cheers,
>   > Benjamin Peterson


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From guido at python.org  Thu May  8 05:12:46 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 7 May 2008 20:12:46 -0700
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com>
Message-ID: <ca471dc20805072012y251d3e24q367a7088fb9b1845@mail.gmail.com>

On Wed, May 7, 2008 at 7:21 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> Can I go ahead and remove this then?

Yes, but let's do it after Barry has released the alphas.

>  >   > It seems that os.walk has more options and a cleaner interface to
>  >   > walking trees than os.path.walk does. Is there support for the removal
>  >   > this in Py3k?
>  >   >
>  >   > --
>  >   > Cheers,
>  >   > Benjamin Peterson
>
>
>
>  --
>  Cheers,
>  Benjamin Peterson
>  "There's no place like 127.0.0.1."
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From barry at python.org  Thu May  8 06:36:00 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 00:36:00 -0400
Subject: [Python-3000] Releasing alphas tonight
In-Reply-To: <4822566D.7080207@cheimes.de>
References: <E7F4F775-76F9-4A1F-9910-7787D645FCC3@python.org>
	<48224E9F.40407@cheimes.de> <4822566D.7080207@cheimes.de>
Message-ID: <B0891873-FB67-4B33-AB83-61DEFE4FD204@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 7, 2008, at 9:25 PM, Christian Heimes wrote:

> Christian Heimes schrieb:
>> The py3k branch has a major show stopper, It's leaking references  
>> to the
>> max.
>
> Fixed ;)

Thanks!

Folks, I apologize.  I had some system problems tonight so I fell  
behind on the release.  I just applied Antoine's patch for bug 2507  
and I'd like to make sure the buildbots complete.  Other than that,  
the only other release critical is one for the release process.  I'll  
complete the releases tomorrow morning (EDT) so in the meantime,  
please refrain from committing anything.

Thanks,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCKDMXEjvBPtnXfVAQL1CgP9Heg1XlNjpM3wgT4N8PK090HnaGIJ6MzH
Fs3QtngvLB/YPf31VrkYILIIMG/YBs+yqCZFziuSR2alNYNBcvwNVfpIljMuq9AM
qtj+cu2vbhkoh+gR8LjM1J8ZWAKhI5G6eAxHuGlTWykdumcSllkB6xW4uLQ2RolZ
eOS5Avnc/Qs=
=kJdD
-----END PGP SIGNATURE-----

From barry at python.org  Thu May  8 06:36:22 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 00:36:22 -0400
Subject: [Python-3000] [Python-Dev] Releasing alphas tonight
In-Reply-To: <001801c8b0ac$9b902dc0$0200a8c0@whiterabc2znlh>
References: <E7F4F775-76F9-4A1F-9910-7787D645FCC3@python.org>
	<48224E9F.40407@cheimes.de>
	<001801c8b0ac$9b902dc0$0200a8c0@whiterabc2znlh>
Message-ID: <6AB977B4-8300-4A07-B7CF-F2403C5D4156@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 7, 2008, at 9:41 PM, Hirokazu Yamamoto wrote:

> Hello.
>
>> The py3k branch has a major show stopper, It's leaking references  
>> to the
>> max.
>
> Is there any chance this leak also will be fixed?
> http://bugs.python.org/issue2222

Not for the alphas, sorry.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCKDRnEjvBPtnXfVAQKjGAP8CFKRDMBYJC8dm+tR/nHucrRa/Nqfy977
I8rx/B5QWN+feBk6LhODaEQ2NPOQaF+iSTaDnOlF9f2+Z6m85b94zsLJPY9EoiAC
qdNmYBmZWYtuzvLmCh5Ef2aCjtfbn4Ik8i3SR9amQJBhuq7ubbdYVUsbcy6HCUUV
K2Xp8LV1HWM=
=aYB/
-----END PGP SIGNATURE-----

From barry at python.org  Thu May  8 13:32:42 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 07:32:42 -0400
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <20080508043520.B60821E400E@bag.python.org>
References: <20080508043520.B60821E400E@bag.python.org>
Message-ID: <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 8, 2008, at 12:35 AM, raymond.hettinger wrote:

> Author: raymond.hettinger
> Date: Thu May  8 06:35:20 2008
> New Revision: 62848
>
> Log:
> Frozensets do not benefit from autoconversion.

Since the trunk buildbots appear to be mostly happy (well those that  
are connected anyway), and because I couldn't get the releases out  
last night, I'll let this one slide.  I'd like to find a way to more  
forcefully enforce commit freezes for the betas though.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCLk2nEjvBPtnXfVAQJxiwP/VPTmeKVLoKkc/xIF0tc/lb6pT7kZ0swL
b1M2TUkl/+xOuKf3J2EIkHOiKdNNmivl80nG/wP9/VTa7lVJGnWgIeLi0yC20Q9n
wvtHaXCrHDc4/ibiShjwYqD4YR0BGwJI7BrlyCYzohbjFK6QYsxd+5a96Cipb/cB
+K/Akjqry4Q=
=xQfb
-----END PGP SIGNATURE-----

From musiccomposition at gmail.com  Thu May  8 13:54:35 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 8 May 2008 06:54:35 -0500
Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
References: <20080508043520.B60821E400E@bag.python.org>
	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
Message-ID: <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>

On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw <barry at python.org> wrote:
>  Since the trunk buildbots appear to be mostly happy (well those that are
> connected anyway), and because I couldn't get the releases out last night,
> I'll let this one slide.  I'd like to find a way to more forcefully enforce
> commit freezes for the betas though.

I wonder if you couldn't alter the server side commit hook to reject
everything with the message "Sorry, we're in a freeze." (You'd have to
make an exception for yourself.)


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From barry at python.org  Thu May  8 13:59:50 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 07:59:50 -0400
Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
References: <20080508043520.B60821E400E@bag.python.org>
	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
Message-ID: <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 8, 2008, at 7:54 AM, Benjamin Peterson wrote:

> On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw <barry at python.org> wrote:
>> Since the trunk buildbots appear to be mostly happy (well those  
>> that are
>> connected anyway), and because I couldn't get the releases out last  
>> night,
>> I'll let this one slide.  I'd like to find a way to more forcefully  
>> enforce
>> commit freezes for the betas though.
>
> I wonder if you couldn't alter the server side commit hook to reject
> everything with the message "Sorry, we're in a freeze." (You'd have to
> make an exception for yourself.)

This is exactly what I'm thinking about!

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCLrNnEjvBPtnXfVAQITDwP/WGqlRHSfvE668clPM3gshhYbAapZcF+e
mNKGwu407/q03LYRqHr2QY0gBxsySJBWl5OsozmJUOTc7NEY/E/MtiauauzCJiyO
24sJ2V52aROwYBLG+4tLFcaGmWmnsWPg79Qj/yJQKMMiH5OznPfagLECOjlwDZZA
ianWqOZxeYc=
=xyD7
-----END PGP SIGNATURE-----

From barry at python.org  Thu May  8 14:03:54 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 08:03:54 -0400
Subject: [Python-3000] Fwd: [issue2547] Py30a4 RELNOTES only cover 30a1 and
	30a2
References: <1210247710.02.0.210073560245.issue2547@psf.upfronthosting.co.za>
Message-ID: <88A090EF-7B71-46F7-9F08-560FCAE07762@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Begin forwarded message:

> From: "Barry A. Warsaw" <report at bugs.python.org>
> Date: May 8, 2008 7:55:10 AM EDT
> To: barry at python.org
> Subject: [issue2547] Py30a4 RELNOTES only cover 30a1 and 30a2
> Reply-To: Tracker <report at bugs.python.org>
>
>
> Barry A. Warsaw <barry at python.org> added the comment:
>
> I've updated the release script to at least touch RELNOTES, but I'm
> unsure as to what the policy is for updating the content of this file.
> I'm closing this issue but will bring it up on the mailing list.
>
> __________________________________
> Tracker <report at bugs.python.org>
> <http://bugs.python.org/issue2547>
> __________________________________

So there was a release critical issue open about making sure to update  
Py3k's RELNOTES file.  I've updated the release script so that I'll be  
sure to edit this file, however I'm not sure what the policy is on  
updating it.  Would you expect me to update it and if so, from what  
data source?  Do we list all open critical bugs on the Py3k tracker?   
All open PEPs?

I'd like to ask everyone doing Py3k development to help pitch in and  
keep this file up-to-date.  I think this will be more important as we  
move to beta releases starting next cycle.

Cheers,
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCLsK3EjvBPtnXfVAQJdsAP8CFBVoUk6Zubmw5NWfOywWQH5kg1oLcm4
mhXm5kGKcPvouNphOs6P4UxqG3l8/Fib0cD5TLCx6SFDCDwamuPSogLBGvCxFpZu
ztjMGyVWNraxxHDgQ1suq1LvOItIMeA6SHqozRpNJ+UchfaEPu8weRSWT0VGB/bN
qoYTQ8rPWwQ=
=poFO
-----END PGP SIGNATURE-----

From lists at cheimes.de  Thu May  8 14:21:54 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 08 May 2008 14:21:54 +0200
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
Message-ID: <4822F062.7090305@cheimes.de>

Barry Warsaw schrieb:
> This is exactly what I'm thinking about!

-1

A technical solution never solves a social problem. It's just going to
cause more social and technical problems.

All community members with svn write privileges must subscribe to the
Python developer list. Committers must check the lists prior to a check
in if a release is immanent. Releases are announced at least four days
prior to svn freeze so it's not going to be a problem. The problem often
lies with occasional committers and maintainers of stdlib packages.
People need to show more discipline or eventually we have to
(temporarily) revoke their privileges.

Christian

From barry at python.org  Thu May  8 15:20:42 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 09:20:42 -0400
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <4822F062.7090305@cheimes.de>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<4822F062.7090305@cheimes.de>
Message-ID: <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 8, 2008, at 8:21 AM, Christian Heimes wrote:

> Barry Warsaw schrieb:
>> This is exactly what I'm thinking about!
>
> -1
>
> A technical solution never solves a social problem. It's just going to
> cause more social and technical problems.

In this case I disagree.  Given our global nature and the vast amounts  
of email we all get, I think a friendly little svn commit hook  
reminder is a simple and workable solution.  This commit lock really  
doesn't need to be in place for very long.  Optimistically, I only  
need it long enough to create the tags, which /normally/ should take  
me 10 minutes.

> All community members with svn write privileges must subscribe to the
> Python developer list. Committers must check the lists prior to a  
> check
> in if a release is immanent. Releases are announced at least four days
> prior to svn freeze so it's not going to be a problem. The problem  
> often
> lies with occasional committers and maintainers of stdlib packages.
> People need to show more discipline or eventually we have to
> (temporarily) revoke their privileges.

Or aggressively back out any changes from freeze time to tag time.  If  
we don't add the commit hook lock, I will be very strict about this  
come the betas.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCL+KnEjvBPtnXfVAQIkxgQAqXXwZjHyI93L1xEvrIPYGkTugxlgEva/
bj9ip59XqB6EYS8NnciJU29WZhcc3WnEoOsdWk7qwYV0qOc2YOgYh775GF4Q2S/A
5qVw+oePFIGCWMhezVG/JYph8V6T0QL36hhgd78WqBJKa2C7IpKEjh3HATwY8DQL
nouyqdmIDJo=
=Vohh
-----END PGP SIGNATURE-----

From ncoghlan at gmail.com  Thu May  8 15:23:11 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 08 May 2008 23:23:11 +1000
Subject: [Python-3000] [Python-checkins] r62848
	-	python/trunk/Objects/setobject.c
In-Reply-To: <4822F062.7090305@cheimes.de>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<4822F062.7090305@cheimes.de>
Message-ID: <4822FEBF.9060800@gmail.com>

Christian Heimes wrote:
> Barry Warsaw schrieb:
>> This is exactly what I'm thinking about!
> 
> -1
> 
> A technical solution never solves a social problem. It's just going to
> cause more social and technical problems.
> 
> All community members with svn write privileges must subscribe to the
> Python developer list. Committers must check the lists prior to a check
> in if a release is immanent. Releases are announced at least four days
> prior to svn freeze so it's not going to be a problem. The problem often
> lies with occasional committers and maintainers of stdlib packages.
> People need to show more discipline or eventually we have to
> (temporarily) revoke their privileges.

It's actually the time zone issues that get me in relation to code 
freezes... so I just try to avoid committing anything for a day or two :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From murman at gmail.com  Thu May  8 15:41:20 2008
From: murman at gmail.com (Michael Urman)
Date: Thu, 8 May 2008 08:41:20 -0500
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
References: <20080508043520.B60821E400E@bag.python.org>
	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<4822F062.7090305@cheimes.de>
	<617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
Message-ID: <dcbbbb410805080641g7b81f029tb147d30e7ca54bc1@mail.gmail.com>

On Thu, May 8, 2008 at 8:20 AM, Barry Warsaw <barry at python.org> wrote:
> Or aggressively back out any changes from freeze time to tag time.  If we
> don't add the commit hook lock, I will be very strict about this come the
> betas.

I know this way is fairly entrenched in the python release process,
but it sounds like it's using the tools incorrectly. In particular
with subversion is very easy (compared to cvs) to branch and to switch
branches locally. Why not create a new prerelease branch at the
beginning of freeze and only merge in the critical changes? This way
only the release manager need know or care about the branch, and
nobody else has to really modify his behavior. Then tag, move, and/or
delete the branch as desired.

The obvious stumbling blocks include buildbots not following the new
branch (this could be a blocker), and release scripts possibly needing
modifications if they contain direct svn url references.

-- 
Michael Urman

From barry at python.org  Thu May  8 15:51:29 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 09:51:29 -0400
Subject: [Python-3000] Freeze lifted
Message-ID: <EC9522A2-50CC-4EFF-B2D9-9131D731CAC3@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I've created the tags for 3.0a5 and 2.6a3, and the tarballs look good,  
so I'm lifting the commit freeze for these two branches.  Thanks  
everyone, and look for the release announcements in a little while.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCMFYXEjvBPtnXfVAQKwyAP/bVnUtGzHMaJcwdc6BZR+kZJ0M22k/Vbp
Nk1IfPts3HPKC7cNWzEkpWlqeXnGC0piuqDGrv2igY2Ori7LVMaTOea1xj8L1KqA
QxiSHT0qtkW9J/io/q3Vw4cdXjshUQahSVPL2upafmCF1ROGDM0IKODq6kzjxgGV
I8XI4BciN20=
=NIY+
-----END PGP SIGNATURE-----

From barry at python.org  Thu May  8 16:24:16 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 10:24:16 -0400
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <dcbbbb410805080641g7b81f029tb147d30e7ca54bc1@mail.gmail.com>
References: <20080508043520.B60821E400E@bag.python.org>
	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<4822F062.7090305@cheimes.de>
	<617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
	<dcbbbb410805080641g7b81f029tb147d30e7ca54bc1@mail.gmail.com>
Message-ID: <694A97D7-5CB1-4DC4-B17F-2B157DE89CF4@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 8, 2008, at 9:41 AM, Michael Urman wrote:

> On Thu, May 8, 2008 at 8:20 AM, Barry Warsaw <barry at python.org> wrote:
>> Or aggressively back out any changes from freeze time to tag time.   
>> If we
>> don't add the commit hook lock, I will be very strict about this  
>> come the
>> betas.
>
> I know this way is fairly entrenched in the python release process,
> but it sounds like it's using the tools incorrectly. In particular
> with subversion is very easy (compared to cvs) to branch and to switch
> branches locally. Why not create a new prerelease branch at the
> beginning of freeze and only merge in the critical changes? This way
> only the release manager need know or care about the branch, and
> nobody else has to really modify his behavior. Then tag, move, and/or
> delete the branch as desired.
>
> The obvious stumbling blocks include buildbots not following the new
> branch (this could be a blocker), and release scripts possibly needing
> modifications if they contain direct svn url references.

I definitely think we'd want the buildbots to track the release  
branches, and it's a bit of a pain to get the release scripts to deal  
with the svn switches.  Right now I think the freeze window is pretty  
short (barring unforeseen networking snafus) that it's not worth it.   
However, once the release process is smooth enough, maybe this little  
freeze hiccup will be worth eliminating.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCMNEHEjvBPtnXfVAQIDogP+NVpyE7AhUS1Eerqv/N+ERTuKnmy/rSNQ
wQhOlAxlvx/lPgm0Mi70C9cA60ogxwGE+nJPf0RQxN2bVfhE/+fvElRl9x7xuoo3
wAK6/zzItqMCP4bpaT8sbsqn4tPB4OCKr0eM/SgZMxrHZkHHZwLTVAw81h40Fmr3
A30V6JpZpdU=
=q3uu
-----END PGP SIGNATURE-----

From guido at python.org  Thu May  8 18:24:35 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 8 May 2008 09:24:35 -0700
Subject: [Python-3000] Fwd: [issue2547] Py30a4 RELNOTES only cover 30a1
	and 30a2
In-Reply-To: <88A090EF-7B71-46F7-9F08-560FCAE07762@python.org>
References: <1210247710.02.0.210073560245.issue2547@psf.upfronthosting.co.za>
	<88A090EF-7B71-46F7-9F08-560FCAE07762@python.org>
Message-ID: <ca471dc20805080924u5626ae03x38ba95abb2d7ae08@mail.gmail.com>

On Thu, May 8, 2008 at 5:03 AM, Barry Warsaw <barry at python.org> wrote:
>  So there was a release critical issue open about making sure to update
> Py3k's RELNOTES file.  I've updated the release script so that I'll be sure
> to edit this file, however I'm not sure what the policy is on updating it.
> Would you expect me to update it and if so, from what data source?  Do we
> list all open critical bugs on the Py3k tracker?  All open PEPs?
>
>  I'd like to ask everyone doing Py3k development to help pitch in and keep
> this file up-to-date.  I think this will be more important as we move to
> beta releases starting next cycle.

I believe I invented this file for the 3.0a1 release, when I realized
that some things were broken but I didn't want to hold up the release
any longer. I also kept adding to it for a while *after* the release,
which seems odd, except that I also copied the updated contents to the
website. Possibly making it public on the website is the main goal of
the file -- it makes users aware of the top ten (say) "gotchas"
without having to scan the bug tracker or ask the mailing list.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mal at egenix.com  Thu May  8 18:52:02 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 08 May 2008 18:52:02 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
Message-ID: <48232FB2.3020205@egenix.com>

On 2008-05-06 15:55, Atsuo Ishimoto wrote:
> (I changed subject)
> 
> Thank you for your comment.
> 
> On Tue, May 6, 2008 at 8:45 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> 
>>  For sys.stdout this doesn't make sense at all, since it hides encoding
>>  errors for all applications using sys.stdout as piping mechanism.
>>  -1 on that.
> 
> You can raise UnicodeEncodigError for encoding errors if you want, by
> setting sys.stdout's error-handler to `strict`.

No, that's not a good idea. I don't want to change every single
affected application just to make sure that they don't write
corrupt data to stdout.

>>  Both are really way beyond the scope of the PEP and I don't
>>  really see the need for them.
> 
> Even though this PEP was rejected, 

You mean PEP 3138 was rejected ??

> I'll still propose to change
> default error-handler for sys.stdout and for sys.stderr to
> 'backslashreplace'. For Python 2, 'strict' error-handler is acceptable
> because most of text data are 8-bit string, but for Py3K, raising
> exceptions when the printed text contains a character not supported by
> console is annoying.

Well, "annoying" is not good enough for such a big change :-)

Please also consider the different situations you are addressing:

  * console output (ie. printing)
  * stdout file output (ie. piping)
  * interactive session use (ie. running print at the Python prompt)

The backslashreplace idea may have some merrits in interactive
Python sessions or IDLE, but it hides encoding errors in all
other situations.

>>  They also don't cover the cases
>>  where you write the repr() to a log file, some stream or syslog.
> 
> Sure. I missed some cases, such as cgitb module or logging module.
> I'll investigate them later. If you have another candidate, please let
> me know.

You have to address the general use cases, not just specific
implementations in the Python stdlib - those can easily be changed,
but doing the same in all the existing code out there that wants
to get ported to Py3k is a different issue.

I'm not against changing the repr() of Unicode objects, but
please make sure that this change does not break debugging
Python applications. Whether you're debugging an app using
'print' statements, piping repr() through a socket to a remote
debugger or writing information to a log file. The important
factor to take into account is the other end that will receive
the data.

BTW: One problem that your PEP doesn't address, which I mentioned
on the ticket:

By putting all printable chars into the repr() you lose the
ability to actually see the number of code points you have
in a Unicode string.

A Unicode-aware editor, shell or pager
will display the data as glyphs and not as code points, ie.
glyphs expressed using combining code points will appear
as one "character" to the user - even though the Unicode object
contains multiple code points. As a result, the length and
any indexes you might use in the debugging session will not
match what the user sees in his shell window.

>>> - Characters defined in the Unicode character database as
> [snip]
>>  This is all very nice, but if that means that the whole Unicode
>>  database has to be loaded every time the interpreter starts up
>>  as you indicated on the ticket, them I'm firmly -1 against that.
> 
> I changed a patch to add a flag to the _PyUnicode_TypeRecords table,
> so the Unicode database is not loaded at stat up.

Thanks.

Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED
isn't all that intuitive.

And also add your definition from the PEP to unicodectype.c - since
this is not a Unicode standard.

I'd also appreciate if you could make that property available
as Unicode method, e.g. .isprintable().

This addition is good on its own.

>>  I proposed to make the Unicode repr() output a regular encoding
>>  that's being implemented by a codec. You could then easily
>>  change the encoding to whatever you need for your application
>>  or console.
> 
> I think global setting is not flexible enough. And I see no benefit to
> customizable repr() except to keep compatible with Python 2, but I
> think it is easy to migrate the existing code to the Py3k.

That's what I don't see in your PEP.

How can things easily be changed so that it's possible to get the
Py2.x style hex escaping back into Py3k without having to change
all repr() calls and %r format markers for Unicode objects ?

I can see your point with it being easier to read e.g. German,
Japanese or Korean data, but it still has to be possible to
use repr() for proper debugging which allows the user to
actually see what is stored in a Unicode object in terms of
code points.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 08 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Thu May  8 19:18:06 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 08 May 2008 19:18:06 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482066E3.7030209@gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com>
Message-ID: <482335CE.7000309@egenix.com>

On 2008-05-06 16:10, Nick Coghlan wrote:
> Atsuo Ishimoto wrote:
>>>  I proposed to make the Unicode repr() output a regular encoding
>>>  that's being implemented by a codec. You could then easily
>>>  change the encoding to whatever you need for your application
>>>  or console.
>>
>> I think global setting is not flexible enough. And I see no benefit to
>> customizable repr() except to keep compatible with Python 2, but I
>> think it is easy to migrate the existing code to the Py3k.
> 
> There's a bigger issue with trying to make whatever repr() does a codec 
> in Py3k. As a Unicode->Unicode transformation, it doesn't mesh well with 
> Py3k's strict Unicode->bytes/bytes->Unicode encoding/decoding philosophy.
> 
> That said, it would be nice to have a way to easily stack 
> Unicode->Unicode transforms on top of text IO streams, or byte->byte 
> transforms on top of binary streams.

+1

Here's what I wrote on the ticket for the PEP. I wasn't aware
of that change, otherwise, I'd have commented on this earlier:

> On 2008-05-06 19:10, Guido van Rossum wrote:
>> Guido van Rossum <guido at python.org> added the comment:
>>
>> On Tue, May 6, 2008 at 1:26 AM, Marc-Andre Lemburg wrote:
>>>  So you've limited the codec design to just doing Unicode<->bytes
>>>  conversions ?
>>
>> Yes. This was quite a conscious decision that was not taken lightly,
>> with lots of community input, quite a while ago.
>>
>>>  The original codec design was to have the codec decide which
>>>  types to take on input and to generate on output, e.g. to
>>>  escape characters in Unicode (converting Unicode to Unicode),
>>>  work on compressed 8-bit strings (converting 8-bit strings to
>>>  8-bit strings), etc.
>>
>> Unfortunately this design made it hard to reason about the correctness
>> of code, since (especially in Py3k, where bytes and str are more
>> different than str and unicode were in 2.x) it's hard to write code
>> that uses .encode() or .decode() unless it knows which codec is being
>> used.
>>
>> IOW, when translated to 3.0, the design violates the general design
>> principle that the *type* of a function's or method's return value
>> should not depend on the *value* of one of the arguments.
> 
> I understand where this concept originates and usual apply this
> rule to software design as well, however, in the particular case
> of codecs, the codec registry and its helper functions are merely
> interfaces to code that is defined elsewhere.
> 
> In comparison, the approach is very much like getattr() - you know
> what the attribute is called, but know nothing about its type
> until you receive it from the function.
> 
> The reason codecs where designed like this was to be able to
> easily stack them. For this to work, only the interfaces need
> to be defined, without restricting the codecs too much in terms
> of which types may be used.
> 
> I'd suggest to lift the type restrictions from the general
> codecs.c access APIs (PyCodec_*), since they don't really belong
> there and instead only impose the limitation on PyUnicode and
> PyString methods .encode() and .decode().
> 
> If you then also allow those methods to return *both*
> PyUnicode and PyString, you'd still have strong typing
> (only 1 of two possible types is allowed) and stacking
> streams or having codecs that work on PyUnicode->PyUnicode
> or PyString->PyString would still be accessible via
> .encode()/.decode(). 

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 08 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From g.brandl at gmx.net  Thu May  8 21:03:45 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 08 May 2008 21:03:45 +0200
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>	<4822F062.7090305@cheimes.de>
	<617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
Message-ID: <fvvika$b35$1@ger.gmane.org>

Barry Warsaw schrieb:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On May 8, 2008, at 8:21 AM, Christian Heimes wrote:
> 
>> Barry Warsaw schrieb:
>>> This is exactly what I'm thinking about!
>>
>> -1
>>
>> A technical solution never solves a social problem. It's just going to
>> cause more social and technical problems.
> 
> In this case I disagree.  Given our global nature and the vast amounts  
> of email we all get, I think a friendly little svn commit hook  
> reminder is a simple and workable solution.

While I'm +0 on the commit hook, it would help if a mail that announces
a freeze would
- not be hidden in a thread on python-dev and
- have a easily recognizable title, like "[TRUNK FREEZE] ....".

Georg


From tjreedy at udel.edu  Thu May  8 22:49:14 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 8 May 2008 16:49:14 -0400
Subject: [Python-3000] [Python-checkins] r62848
	-python/trunk/Objects/setobject.c
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>	<4822F062.7090305@cheimes.de><617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
	<fvvika$b35$1@ger.gmane.org>
Message-ID: <fvvp06$1oa$1@ger.gmane.org>

Given that we cannot depend on timely mail/news propagation or on exact 
day-ahead scheduling of a freeze, a current freeze notice either from the 
repository or on a .../dev/status page might work better. 


From tjreedy at udel.edu  Thu May  8 22:55:38 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 8 May 2008 16:55:38 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>
Message-ID: <fvvpc7$32i$1@ger.gmane.org>

Functions that map unicode->unicode or bytes->bytes could be called 
transcoders.  Each type could be given a .transcode method to go along with 
but contrast with .encode or .decode.

tjr


From barry at python.org  Fri May  9 01:50:06 2008
From: barry at python.org (Barry Warsaw)
Date: Thu, 8 May 2008 19:50:06 -0400
Subject: [Python-3000] RELEASED Python 2.6a3 and 3.0a5
Message-ID: <88DFD025-8670-42FA-9B73-AFF5193FB0AE@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On behalf of the Python development team and the Python community, I  
am happy to announce the third alpha release of Python 2.6, and the  
fifth alpha release of Python 3.0.

Please note that these are alpha releases, and as such are not  
suitable for production environments.  We continue to strive for a  
high degree of quality, but there are still some known problems and  
the feature sets have not been finalized.  These alphas are being  
released to solicit feedback and hopefully discover bugs, as well as  
allowing you to determine how changes in 2.6 and 3.0 might impact
you.  If you find things broken or incorrect, please submit a bug  
report at

    http://bugs.python.org

For more information and downloadable distributions, see the Python
2.6 website:

    http://www.python.org/download/releases/2.6/

and the Python 3.0 web site:

    http://www.python.org/download/releases/3.0/

These are the last planned alphas for both versions.  If all goes  
well, next month will see the first beta releases of both, which will  
also signal feature freeze.  Two beta releases are planned, with the  
final releases scheduled for September 3, 2008.

See PEP 361 for release details:

     http://www.python.org/dev/peps/pep-0361/

Enjoy,
- -Barry

Barry Warsaw
barry at python.org
Python 2.6/3.0 Release Manager
(on behalf of the entire python-dev team)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCORrnEjvBPtnXfVAQIK+QQAgEUtAvW7uo0BxMiT1bCAo2E9ZecWJ9xe
DBgd/5IK8moITkqhqGAH5UvfytV6uPkOMgGIS/Uvk4hzhU3jwSopEIDJLFQ5nGtC
lCzOHzkDjSNZ8Q2OOAI9mbSHY8grvVxCMB4X2SVXIEMZ6M/X1AcV2b0utp9O1w/l
T/PEvP8U1uY=
=2Tnb
-----END PGP SIGNATURE-----

From g.brandl at gmx.net  Fri May  9 08:02:02 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 09 May 2008 08:02:02 +0200
Subject: [Python-3000] [Python-checkins] r62848
	-python/trunk/Objects/setobject.c
In-Reply-To: <fvvp06$1oa$1@ger.gmane.org>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>	<4822F062.7090305@cheimes.de><617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>	<fvvika$b35$1@ger.gmane.org>
	<fvvp06$1oa$1@ger.gmane.org>
Message-ID: <g00p6j$gil$1@ger.gmane.org>

Terry Reedy schrieb:
> Given that we cannot depend on timely mail/news propagation or on exact 
> day-ahead scheduling of a freeze, a current freeze notice either from the 
> repository or on a .../dev/status page might work better. 

Nobody is going to look at such a page before making a commit :)

Georg


From humberto at digi.com.br  Fri May  9 09:45:42 2008
From: humberto at digi.com.br (Humberto Diogenes)
Date: Fri, 9 May 2008 04:45:42 -0300
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <ca471dc20805072012y251d3e24q367a7088fb9b1845@mail.gmail.com>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com>
	<ca471dc20805072012y251d3e24q367a7088fb9b1845@mail.gmail.com>
Message-ID: <5ED2EEE7-2846-45A9-90DF-51765E829328@digi.com.br>


On 08/05/2008, at 00:12, Guido van Rossum wrote:

> On Wed, May 7, 2008 at 7:21 PM, Benjamin Peterson
> <musiccomposition at gmail.com> wrote:
>> Can I go ahead and remove this then?
>
> Yes, but let's do it after Barry has released the alphas.
>
> --  
> --Guido van Rossum (home page: http://www.python.org/~guido/)


Hi, Benjamin!

   I noticed you've already removed os.path.walk in r62909, but there  
are still some references to it in the code, as I noticed issuing a  
`make altinstall` on a Mac:
   AttributeError: 'module' object has no attribute 'walk'


   References in .py files:

./Mac/scripts/cachersrc.py:42:        os.path.walk(dir, handler,  
(verbose, force))
./Mac/scripts/zappycfiles.py:25:    os.path.walk(dir, walker, None)
./Mac/Tools/Doc/setup.py:112:        os.path.walk(self.build_html,  
self.visit, None)
./setup.py:1577:        os.path.walk(dirname,  
self.set_dir_modes_visitor, mode)
./Tools/i18n/pygettext.py:344:        os.path.walk(name,  
_visit_pyfiles, list)
./Tools/scripts/findlinksto.py:25:        os.path.walk(dirname, visit,  
prog)
./Tools/versioncheck/checkversions.py:34:    os.path.walk(tree,  
check1dir, None)


   Maybe it would be nice to include some tips about the translation  
from os.path.walk to os.walk in the migration notes, too.

Thanks!
--
Humberto Di?genes
http://humberto.digi.com.br


From stephen at xemacs.org  Fri May  9 10:11:08 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 09 May 2008 17:11:08 +0900
Subject: [Python-3000] [Python-checkins] r62848
	-	python/trunk/Objects/setobject.c
In-Reply-To: <dcbbbb410805080641g7b81f029tb147d30e7ca54bc1@mail.gmail.com>
References: <20080508043520.B60821E400E@bag.python.org>
	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<4822F062.7090305@cheimes.de>
	<617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
	<dcbbbb410805080641g7b81f029tb147d30e7ca54bc1@mail.gmail.com>
Message-ID: <8763tnnbhf.fsf@uwakimon.sk.tsukuba.ac.jp>

Michael Urman writes:

 > I know this way is fairly entrenched in the python release process,
 > but it sounds like it's using the tools incorrectly. In particular
 > with subversion is very easy (compared to cvs) to branch and to switch
 > branches locally. Why not create a new prerelease branch at the
 > beginning of freeze and only merge in the critical changes?

Well, speaking from experience:

 - some of the "critical changes" may only get committed on the
   release branch

 - something different from what's in the mainline may get committed
   on the release branch

 - the milestones are on a sideline, not on the mainline.

Getting these points right is essential to ensure that the beta
testers' work is actually relevant to the development process, that
bisection searches work correctly, etc.

 > only the release manager need know or care about the branch, and
 > nobody else has to really modify his behavior.

Behavior modification is the main point of having a release cycle.
Setting deadlines, changing the nature of the patches, bringing issues
to closure, etc.  A release without a freeze is like a sentence
without a period, IMO.

From humberto at digi.com.br  Fri May  9 10:27:56 2008
From: humberto at digi.com.br (Humberto Diogenes)
Date: Fri, 9 May 2008 05:27:56 -0300
Subject: [Python-3000] Removal of os.path.walk
In-Reply-To: <5ED2EEE7-2846-45A9-90DF-51765E829328@digi.com.br>
References: <d78741030804292010q116e90a3o3587cd8fae5a7ccb@mail.gmail.com>
	<1afaf6160805071921l88465eei596d707eb0842575@mail.gmail.com>
	<ca471dc20805072012y251d3e24q367a7088fb9b1845@mail.gmail.com>
	<5ED2EEE7-2846-45A9-90DF-51765E829328@digi.com.br>
Message-ID: <5E8DADB0-DCAD-4B5A-8910-CE8C7357FCE0@digi.com.br>


On 09/05/2008, at 04:45, Humberto Diogenes wrote:

>  I noticed you've already removed os.path.walk in r62909, but there  
> are still some references to it in the code, as I noticed issuing a  
> `make altinstall` on a Mac:
>  AttributeError: 'module' object has no attribute 'walk'


Here's the fix for the installation issue:


Index: setup.py
===================================================================
--- setup.py	(revision 62932)
+++ setup.py	(working copy)
@@ -1574,13 +1574,10 @@

      def set_dir_modes(self, dirname, mode):
          if not self.is_chmod_supported(): return
-        os.path.walk(dirname, self.set_dir_modes_visitor, mode)
+        for root, dirs, files in os.walk(dirname):
+            log.info("changing mode of %s to %o" % (root, mode))
+            if not self.dry_run: os.chmod(root, mode)

-    def set_dir_modes_visitor(self, mode, dirname, names):
-        if os.path.islink(dirname): return
-        log.info("changing mode of %s to %o", dirname, mode)
-        if not self.dry_run: os.chmod(dirname, mode)
-
      def is_chmod_supported(self):
          return hasattr(os, 'chmod')


I don't even know if this is really necessary, as it seems to run in  
one directory only:
changing mode of /usr/local/lib/python3.0/lib-dynload/ to 755


--
Humberto Di?genes
http://humberto.digi.com.br


From mal at egenix.com  Fri May  9 12:44:01 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 09 May 2008 12:44:01 +0200
Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848
	-	python/trunk/Objects/setobject.c
In-Reply-To: <18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
Message-ID: <48242AF1.906@egenix.com>

On 2008-05-08 13:59, Barry Warsaw wrote:
> On May 8, 2008, at 7:54 AM, Benjamin Peterson wrote:
> 
>> On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw <barry at python.org> wrote:
>>> Since the trunk buildbots appear to be mostly happy (well those that are
>>> connected anyway), and because I couldn't get the releases out last 
>>> night,
>>> I'll let this one slide.  I'd like to find a way to more forcefully 
>>> enforce
>>> commit freezes for the betas though.
> 
>> I wonder if you couldn't alter the server side commit hook to reject
>> everything with the message "Sorry, we're in a freeze." (You'd have to
>> make an exception for yourself.)
> 
> This is exactly what I'm thinking about!

+1, that's easy to do with Subversion and doesn't hurt anyone.

Please also use a term like "freeze" or "frozen" in the subject line
of the announcement - perhaps even in capital letters.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 09 2008)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Fri May  9 12:54:02 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 09 May 2008 12:54:02 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fvvpc7$32i$1@ger.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org>
Message-ID: <48242D4A.3060802@egenix.com>

On 2008-05-08 22:55, Terry Reedy wrote:
> Functions that map unicode->unicode or bytes->bytes could be called 
> transcoders.  Each type could be given a .transcode method to go along with 
> but contrast with .encode or .decode.

Are you suggesting to have two separate methods which then
allow same-type-conversions ? One for encoding to the same
type and one for decoding ?

Fine with me.

They do have to map naturally to the codec method encode and
decode, though, so a single method won't do, unless maybe
you add a parameter to define the direction of the coding
process.

In summary, I'd just like to see the following happen:

  * revert the type restrictions on the PyCodec_* API

  * enforce the restrictions on the .encode() and .decode()
    methods of PyUnicode and PyString objects (str and bytes)

  * add a way to PyUnicode and PyString objects (str and bytes)
    to allow same type encoding and decoding

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 09 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From barry at python.org  Fri May  9 14:22:32 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 9 May 2008 08:22:32 -0400
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <fvvika$b35$1@ger.gmane.org>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>	<4822F062.7090305@cheimes.de>
	<617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>
	<fvvika$b35$1@ger.gmane.org>
Message-ID: <520D2894-2296-4C6D-97FF-8521520E8E81@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 8, 2008, at 3:03 PM, Georg Brandl wrote:
>
> While I'm +0 on the commit hook, it would help if a mail that  
> announces
> a freeze would
> - not be hidden in a thread on python-dev and
> - have a easily recognizable title, like "[TRUNK FREEZE] ....".

I will make the freeze announcement more recognizable in the future,  
but I also want to point out that the entire release schedule has been  
published far in advance in PEP 361.  At this point, the freeze dates  
should come as no surprise.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCRCCnEjvBPtnXfVAQKJgAQAojZ5vIg2K4q4e+XEHogQKeFjxkh5+o6U
eWDjmkeVImwe1Sylb+mCqrxQ7JNY6d1m35hQsna/Ghan1IVIQ857fCBXS84aIUGl
AGAnbrzxAt7RoYz/dyhz2twf1Uui5OVGOCYnmZ3ExZhTrEHN7ze43C+Blir0sH+4
DCuDj4xmpMM=
=6W75
-----END PGP SIGNATURE-----

From skip at pobox.com  Fri May  9 14:15:21 2008
From: skip at pobox.com (skip at pobox.com)
Date: Fri, 9 May 2008 07:15:21 -0500
Subject: [Python-3000] Code Freeze - full or partial?
In-Reply-To: <48242AF1.906@egenix.com>
References: <20080508043520.B60821E400E@bag.python.org>
	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<48242AF1.906@egenix.com>
Message-ID: <18468.16473.206010.819980@montanaro-dyndns-org.local>


In the past I seem to recall that the Python code proper might be frozen
(for a day or two) before a release, but that it was okay to still commit
changes to non-code files such as documentation or files in Misc.  Is this
still the case in the new release-early-release-often regime?  Is the
intention to make the duration of the code freeze so short (a few minutes or
hours) that it's not worth the effort to make this distinction?

Skip

From barry at python.org  Fri May  9 15:25:17 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 9 May 2008 09:25:17 -0400
Subject: [Python-3000] [Python-Dev] [Python-checkins] r62848
	-	python/trunk/Objects/setobject.c
In-Reply-To: <48242AF1.906@egenix.com>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<48242AF1.906@egenix.com>
Message-ID: <308D6BF1-936D-45F2-960B-0A42D04186A5@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 9, 2008, at 6:44 AM, M.-A. Lemburg wrote:

> On 2008-05-08 13:59, Barry Warsaw wrote:
>> On May 8, 2008, at 7:54 AM, Benjamin Peterson wrote:
>>> On Thu, May 8, 2008 at 6:32 AM, Barry Warsaw <barry at python.org>  
>>> wrote:
>>>> Since the trunk buildbots appear to be mostly happy (well those  
>>>> that are
>>>> connected anyway), and because I couldn't get the releases out  
>>>> last night,
>>>> I'll let this one slide.  I'd like to find a way to more  
>>>> forcefully enforce
>>>> commit freezes for the betas though.
>>> I wonder if you couldn't alter the server side commit hook to reject
>>> everything with the message "Sorry, we're in a freeze." (You'd  
>>> have to
>>> make an exception for yourself.)
>> This is exactly what I'm thinking about!
>
> +1, that's easy to do with Subversion and doesn't hurt anyone.

Agreed.  Look for it for the first beta.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCRQvXEjvBPtnXfVAQKyLwP8D0AVX+jgvy04hM207eeWRZb3JcHMtZuP
ZcOuBQsCsVFppCxAreYIwfa0e6TD2LHBV4uz/G7Nxt6qNI6SY7lHQezNg4RezFwJ
e93HAGdD0djj4BrL/xCr0wrK6wCwjodcvcjFdqTjEdLnkS7KGM9ooW8ZdYjQp6jI
E+ZLDdhQ/KY=
=24yM
-----END PGP SIGNATURE-----

From barry at python.org  Fri May  9 15:30:11 2008
From: barry at python.org (Barry Warsaw)
Date: Fri, 9 May 2008 09:30:11 -0400
Subject: [Python-3000] Code Freeze - full or partial?
In-Reply-To: <18468.16473.206010.819980@montanaro-dyndns-org.local>
References: <20080508043520.B60821E400E@bag.python.org>
	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>
	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>
	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>
	<48242AF1.906@egenix.com>
	<18468.16473.206010.819980@montanaro-dyndns-org.local>
Message-ID: <F6FAAC89-38AF-4315-9909-B93D9D37171E@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On May 9, 2008, at 8:15 AM, skip at pobox.com wrote:

> In the past I seem to recall that the Python code proper might be  
> frozen
> (for a day or two) before a release, but that it was okay to still  
> commit
> changes to non-code files such as documentation or files in Misc.   
> Is this
> still the case in the new release-early-release-often regime?  Is the
> intention to make the duration of the code freeze so short (a few  
> minutes or
> hours) that it's not worth the effort to make this distinction?

For the alphas, that's certainly been the case because it hasn't been  
necessary to coordinate all the Experts.  IOW, it's okay for the  
Windows installer to get uploaded a few hours after the tarballs.

For the betas, rcs and finals, I think we want a little bit more  
coordination (correct me if you disagree).  So in that case, there may  
be a longer freeze.  Even in that case, I don't envision more than a  
24 hour freeze hopefully.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (Darwin)

iQCVAwUBSCRR5HEjvBPtnXfVAQLa+gP8CL9koa5eGBvP8g+CA8l61SIuluHNbPkq
SH7uOiPMeuIX392xy82ixnXjYTlCJn9epWouYkiWta3GA+ZaCcmTFFavZ3ZbLbE3
uxfzhCWsZ5EUW5/iDCOUrlEwuxXJ6FU4naRTaTCBTELXRKvb3sI5C2pFjrb6JTZc
hP2hP6m+A2Y=
=avCD
-----END PGP SIGNATURE-----

From steve at holdenweb.com  Fri May  9 15:31:43 2008
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 09 May 2008 09:31:43 -0400
Subject: [Python-3000] [Python-checkins] r62848 -
	python/trunk/Objects/setobject.c
In-Reply-To: <520D2894-2296-4C6D-97FF-8521520E8E81@python.org>
References: <20080508043520.B60821E400E@bag.python.org>	<718F3CF3-E61B-43DC-81CC-E6EF0AAB9403@python.org>	<1afaf6160805080454m3e0422b5sb9b7931591b703c3@mail.gmail.com>	<18BD2693-5453-450D-A7C4-6B4B6F5C3AB9@python.org>	<4822F062.7090305@cheimes.de>	<617FFC34-8C94-48ED-951D-59F3467E6DC7@python.org>	<fvvika$b35$1@ger.gmane.org>
	<520D2894-2296-4C6D-97FF-8521520E8E81@python.org>
Message-ID: <g01jol$2p0$1@ger.gmane.org>

Barry Warsaw wrote:
> On May 8, 2008, at 3:03 PM, Georg Brandl wrote:
> 
>> While I'm +0 on the commit hook, it would help if a mail that announces
>> a freeze would
>> - not be hidden in a thread on python-dev and
>> - have a easily recognizable title, like "[TRUNK FREEZE] ....".
> 
> I will make the freeze announcement more recognizable in the future, but 
> I also want to point out that the entire release schedule has been 
> published far in advance in PEP 361.  At this point, the freeze dates 
> should come as no surprise.
> 
A python-dev calendar on Google Calendars? That would give us one more 
warning to ignore :-)

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From eric+python-dev at trueblade.com  Fri May  9 17:24:35 2008
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 09 May 2008 11:24:35 -0400
Subject: [Python-3000] Adding 'n' format presentation type to integers
Message-ID: <48246CB3.7060504@trueblade.com>

'n' is like 'g', but adds locale-specific thousands separators.

Issue 2802 (http://bugs.python.org/issue2802) points out that 'n' 
formatting isn't useful for integers, because it first converts to 
float.  There's no way to get 1,000,000 as a result, since 'g' converts 
to '1e+06'.

I propose adding 'n' as an integer format presentation type to PEP 3101. 
  The definition would be:

'n' - Number. This is the same as 'd', except that it uses the
               current locale setting to insert the appropriate
               number separator characters.

I already have the C code needed to implement this in Python/pystrtod.c 
(for floats), so it would just take some refactoring to get the integer 
formatter to use it.

If there is agreement, I'll update the PEP and implement this in 2.6 and 
3.0.

Eric.

From guido at python.org  Fri May  9 18:06:59 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 9 May 2008 09:06:59 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <48242D4A.3060802@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
Message-ID: <ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>

On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-05-08 22:55, Terry Reedy wrote:
>>
>> Functions that map unicode->unicode or bytes->bytes could be called
>> transcoders.  Each type could be given a .transcode method to go along with
>> but contrast with .encode or .decode.
>
> Are you suggesting to have two separate methods which then
> allow same-type-conversions ? One for encoding to the same
> type and one for decoding ?
>
> Fine with me.
>
> They do have to map naturally to the codec method encode and
> decode, though, so a single method won't do, unless maybe
> you add a parameter to define the direction of the coding
> process.
>
> In summary, I'd just like to see the following happen:
>
>  * revert the type restrictions on the PyCodec_* API
>
>  * enforce the restrictions on the .encode() and .decode()
>   methods of PyUnicode and PyString objects (str and bytes)
>
>  * add a way to PyUnicode and PyString objects (str and bytes)
>   to allow same type encoding and decoding

+1

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ishimoto at gembook.org  Fri May  9 19:23:07 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Sat, 10 May 2008 02:23:07 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <48232FB2.3020205@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
Message-ID: <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>

On Fri, May 9, 2008 at 1:52 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>  For sys.stdout this doesn't make sense at all, since it hides encoding
>>>  errors for all applications using sys.stdout as piping mechanism.
>>>  -1 on that.
>>
>> You can raise UnicodeEncodigError for encoding errors if you want, by
>> setting sys.stdout's error-handler to `strict`.
>
> No, that's not a good idea. I don't want to change every single
> affected application just to make sure that they don't write
> corrupt data to stdout.

The changes you need to make for your applications will be so small
that I don't think this is valid argument.
And number of applications you need to change will be rather small.
What you call  "corrupt data" are just hex-escaped characters of
foreign language. In most case, printing(or writing to file) such
string doesn't harm, so I think raising exception by default is
overkill. Java doesn't raise exception for encoding error, but just
print `?`. .NET languages such as C# also prints '?'. Perl prints
hex-escaped string, as proposed in this PEP.

>> Even though this PEP was rejected,
>
> You mean PEP 3138 was rejected ??

Er, I should have written "Even if this PEP was ...", perhaps.

> Well, "annoying" is not good enough for such a big change :-)

So? Annoyance of Perl was enough reason to change entire language for me :-)

> The backslashreplace idea may have some merrits in interactive
> Python sessions or IDLE, but it hides encoding errors in all
> other situations.

Encoding errors are not hidden, but are represented by hex-escaped
strings. We can get much more information about the string being
printed than printing tracebacks.

> I'm not against changing the repr() of Unicode objects, but
> please make sure that this change does not break debugging
> Python applications.Whether you're debugging an app using
> 'print' statements, piping repr() through a socket to a remote
> debugger or writing information to a log file. The important
> factor to take into account is the other end that will receive
> the data.

I think your request is too vague to be completed. This proposal
improve current broken debugging for me, and I see no lost information
for debugging. But the "other end" may be too vary to say something.

> BTW: One problem that your PEP doesn't address, which I mentioned
> on the ticket:
>
> By putting all printable chars into the repr() you lose the
> ability to actually see the number of code points you have
> in a Unicode string.
>

With current repr(), I can not get any information other than number
of code points. This is not what I want to know by printing repr().
For length of the string, I'll just do print(len(s)).

>
> Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED
> isn't all that intuitive.

The name `Py_UNICODE_ISPRINTABLE` came to my mind at first, but I was
not sure the `printable`  is accurate word. I'm okay for
Py_UNICODE_ISPRINTABLE, but I'd like to hear opinions. If no one
objects Py_UNICODE_ISPRINTABLE, I'll go for it.

>
> How can things easily be changed so that it's possible to get the
> Py2.x style hex escaping back into Py3k without having to change
> all repr() calls and %r format markers for Unicode objects ?

I didn't intend to imply "without having to change".  Perhaps,
"migrate" would be wrong word and "port" may be better.

For repr() and %r format, they are unlikely to be changed in most
case. They need to be changed if pure ASCII are required even if your
locale is capable to print the strings.

> I can see your point with it being easier to read e.g. German,
> Japanese or Korean data, but it still has to be possible to
> use repr() for proper debugging which allows the user to
> actually see what is stored in a Unicode object in terms of
> code points.

You can see code points easily, the function I wrote in the PEP to
convert such strings as repr() in Python 2 is good example. But I
believe ordinary use-case prefer readable string over code points.

From dalcinl at gmail.com  Fri May  9 23:35:12 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Fri, 9 May 2008 18:35:12 -0300
Subject: [Python-3000] about the status of PyNumberMethods
Message-ID: <e7ba66e40805091435w467059b2x2b0e3a6bf0d42b57@mail.gmail.com>

Yesterday I was working on a patch for Cython to make the generated C
code works from Python 2.3 to 2.6 and also 3.0.

After four hours of carefully diving in Python sources from 2.3 to 3.0
and finishing the patch, the only stuff I would object from the
current codebase of Py3K is the status of PyNumberMethods.

A slot changed its name (nb_nonzero to nb_bool), some slots are gone
(nb_[inplace_]divide) and others are unused (nb_hex, nb_oct, and
nb_coerce). What are the long term plans for this?

BTW, I was also looking at the very, very clever hackery implementing
the method cache for types. My English is crude, but perhaps the
Py_TPFLAGX_[HAVE|VALID]_VERSION_TAG could be renamed to something like
XXX_MCACHE_TAG or XXX_METHODCACHE_TAG, that IMHO is more descriptive
of what those flag are intendef for...

-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From tjreedy at udel.edu  Fri May  9 23:52:22 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 9 May 2008 17:52:22 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com><fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>
Message-ID: <g02h2n$a9a$1@ger.gmane.org>


"M.-A. Lemburg" <mal at egenix.com> wrote in message 
news:48242D4A.3060802 at egenix.com...
| On 2008-05-08 22:55, Terry Reedy wrote:
| > Functions that map unicode->unicode or bytes->bytes could be called
| > transcoders.  Each type could be given a .transcode method to go along 
with
| > but contrast with .encode or .decode.

My main idea is that we can both keep current functionality *and* the new 
restriction on usage of .encode() and .decode() (which *does* make things 
less confusing at least for me).

| Are you suggesting to have two separate methods which then
| allow same-type-conversions ? One for encoding to the same
| type and one for decoding ?

I only suggested the possibility of one because I was thinking of 
transcoders more generally than those in definite 'encode'/'decode' pairs. 
A lossy encoder needs a decoder just to do the reverse type conversion. 
But a lossy transcoder whose natural partner is the identity function does 
not.  At least not conceptually.  (Example for bytes: map most control 
chars to 0 and any above 127 to 127.)  Another difference is that 
transcoders can be chained is a way that encoders (or decoders, both in the 
class-changing sense) cannot.

Thinking more, I realize that there are byte transcoders scattered across 
several modules and they are not going to be consolidated.  Perhaps only 
unicode 'transcoders' are needed.  But not for me to decide.

| Fine with me.

I do not really have a hat in this ring, so details are for others to 
decide.

| They do have to map naturally to the codec method encode and
| decode, though, so a single method won't do, unless maybe
| you add a parameter to define the direction of the coding
| process.

It was an open question to me whether to reuse codecs or make a new 
transcoders module.  But ditto my last comment.

Terry Jan Reedy


From eric+python-dev at trueblade.com  Sun May 11 05:16:56 2008
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Sat, 10 May 2008 23:16:56 -0400
Subject: [Python-3000] Adding 'n' format presentation type to integers
In-Reply-To: <48246CB3.7060504@trueblade.com>
References: <48246CB3.7060504@trueblade.com>
Message-ID: <48266528.4080400@trueblade.com>

Eric Smith wrote:
> 'n' is like 'g', but adds locale-specific thousands separators.
> 
> Issue 2802 (http://bugs.python.org/issue2802) points out that 'n' 
> formatting isn't useful for integers, because it first converts to 
> float.  There's no way to get 1,000,000 as a result, since 'g' converts 
> to '1e+06'.
> 
> I propose adding 'n' as an integer format presentation type to PEP 3101. 
>  The definition would be:
> 
> 'n' - Number. This is the same as 'd', except that it uses the
>               current locale setting to insert the appropriate
>               number separator characters.
> 
> I already have the C code needed to implement this in Python/pystrtod.c 
> (for floats), so it would just take some refactoring to get the integer 
> formatter to use it.
> 
> If there is agreement, I'll update the PEP and implement this in 2.6 and 
> 3.0.

Having heard no objections, I'll update the PEP and check in the change.

Eric.


From g.brandl at gmx.net  Sun May 11 22:58:50 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 11 May 2008 22:58:50 +0200
Subject: [Python-3000] CGI module - remove backward-compatibility classes?
Message-ID: <g07mmp$9si$1@ger.gmane.org>

The CGI module has some classes that are marked as "backwards compatibility
only". They are not formally deprecated in the docs, but this can be done
for 2.6. Should we remove them in 3.0?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From guido at python.org  Sun May 11 23:21:21 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 11 May 2008 14:21:21 -0700
Subject: [Python-3000] CGI module - remove backward-compatibility
	classes?
In-Reply-To: <g07mmp$9si$1@ger.gmane.org>
References: <g07mmp$9si$1@ger.gmane.org>
Message-ID: <ca471dc20805111421w7ac9b127q7e8f0d8741e6255a@mail.gmail.com>

On 5/11/08, Georg Brandl <g.brandl at gmx.net> wrote:
> The CGI module has some classes that are marked as "backwards compatibility
> only". They are not formally deprecated in the docs, but this can be done
> for 2.6. Should we remove them in 3.0?
>
> Georg
>
> --
> Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
> Four shall be the number of spaces thou shalt indent, and the number of thy
> indenting shall be four. Eight shalt thou not indent, nor either indent thou
> two, excepting that thou then proceed to four. Tabs are right out.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
Sent from Gmail for mobile | mobile.google.com

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Sun May 11 23:22:00 2008
From: guido at python.org (Guido van Rossum)
Date: Sun, 11 May 2008 14:22:00 -0700
Subject: [Python-3000] CGI module - remove backward-compatibility
	classes?
In-Reply-To: <g07mmp$9si$1@ger.gmane.org>
References: <g07mmp$9si$1@ger.gmane.org>
Message-ID: <ca471dc20805111422r43ed65d4yd7457f793fa85457@mail.gmail.com>

+1

On 5/11/08, Georg Brandl <g.brandl at gmx.net> wrote:
> The CGI module has some classes that are marked as "backwards compatibility
> only". They are not formally deprecated in the docs, but this can be done
> for 2.6. Should we remove them in 3.0?
>
> Georg
>
> --
> Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
> Four shall be the number of spaces thou shalt indent, and the number of thy
> indenting shall be four. Eight shalt thou not indent, nor either indent thou
> two, excepting that thou then proceed to four. Tabs are right out.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>

-- 
Sent from Gmail for mobile | mobile.google.com

--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Sun May 11 23:43:05 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 11 May 2008 23:43:05 +0200
Subject: [Python-3000] CGI module - remove backward-compatibility
	classes?
In-Reply-To: <ca471dc20805111422r43ed65d4yd7457f793fa85457@mail.gmail.com>
References: <g07mmp$9si$1@ger.gmane.org>
	<ca471dc20805111422r43ed65d4yd7457f793fa85457@mail.gmail.com>
Message-ID: <g07p9o$gu4$1@ger.gmane.org>

Done in r63099.

Georg

Guido van Rossum schrieb:
> +1
> 
> On 5/11/08, Georg Brandl <g.brandl at gmx.net> wrote:
>> The CGI module has some classes that are marked as "backwards compatibility
>> only". They are not formally deprecated in the docs, but this can be done
>> for 2.6. Should we remove them in 3.0?
>>
>> Georg
>>
>> --
>> Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
>> Four shall be the number of spaces thou shalt indent, and the number of thy
>> indenting shall be four. Eight shalt thou not indent, nor either indent thou
>> two, excepting that thou then proceed to four. Tabs are right out.
>>
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>
> 


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From mal at egenix.com  Wed May 14 18:18:39 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 14 May 2008 18:18:39 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
Message-ID: <482B10DF.50105@egenix.com>

Atuso

you are not really addressing my arguments in your reply.

My main concern is that repr(unicode) as well as '%r' is used
a lot in logging and debugging of applications.

In the 2.x series of Python, the output of repr() has traditionally
always been plain ASCII and does not require any special encoding
and also doesn't run into problems when mixing the output with
other encodings used in the log file, on the console or whereever
the output of repr() is sent.

You are now suggesting to break this convention by allowing
all printable code points to be used in the repr() output.
Depending on where you send the repr() output and the contents
of the PyUnicode object, this will likely result in exceptions
in the .write() method of the stream object.

Just adjusting sys.stdout and sys.stderr to prevent them from
falling over is not enough (and is indeed not within the scope
of the PEP, since those changes are *major* and not warranted
for just getting your Unicode repr() to work). repr() is very
often written to log files and those would all have to be
changed as well.

Now, as I've said before, I can see your point about wanting
to be able to read the Unicode code points, even if you use
repr() - instead of the more straight-forward .encode()
approach. However, when suggesting such changes, you always
have to see the other side as well:

  - Are there alternative ways to get the "problem" fixed ?
  - Is the added convenience worth breaking existing conventions ?
  - Is it worth breaking existing applications ?

I've suggested making the repr() output configurable to address
the convenience aspect of your proposal. You could then set the
output encoding to e.g. "unicode-printable" and get your preferred
output. The default could remain set to the current all-ASCII output.

Hardwiring the encoding is not a good idea, esp. since there
are lots of alternatives for you to get readable output from
PyUnicode object now and without any changes to the interpreter.

E.g.

print '%s' % u.encode('utf-8')

or

print '%s' % u.encode('shift-jis')

or

logfile = open('my.log', encoding='unicode-printable')
logfile.write(u)

or

def unicode_repr(u):
     return u.encode('unicode-printable')
print '%s' % unicode_repr(u)

There are many ways to solve your problem.

In summary, I am:

  -1 on hardwiring the unicode repr() output to a non-ASCII
     encoding

  +1 on adding the PyUnicode_ISPRINTABLE() API

  +1 on adding a unicode-printable codec which implements
     your suggested encoding, so that you can use it for e.g.
     log files or as sys.stdout encoding

  +0 on making unicode repr() encoding adjustable

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 14 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611


On 2008-05-09 19:23, Atsuo Ishimoto wrote:
> On Fri, May 9, 2008 at 1:52 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>  For sys.stdout this doesn't make sense at all, since it hides encoding
>>>>  errors for all applications using sys.stdout as piping mechanism.
>>>>  -1 on that.
>>> You can raise UnicodeEncodigError for encoding errors if you want, by
>>> setting sys.stdout's error-handler to `strict`.
>> No, that's not a good idea. I don't want to change every single
>> affected application just to make sure that they don't write
>> corrupt data to stdout.
> 
> The changes you need to make for your applications will be so small
> that I don't think this is valid argument.
> And number of applications you need to change will be rather small.
> What you call  "corrupt data" are just hex-escaped characters of
> foreign language. In most case, printing(or writing to file) such
> string doesn't harm, so I think raising exception by default is
> overkill. Java doesn't raise exception for encoding error, but just
> print `?`. .NET languages such as C# also prints '?'. Perl prints
> hex-escaped string, as proposed in this PEP.
> 
>>> Even though this PEP was rejected,
>> You mean PEP 3138 was rejected ??
> 
> Er, I should have written "Even if this PEP was ...", perhaps.
> 
>> Well, "annoying" is not good enough for such a big change :-)
> 
> So? Annoyance of Perl was enough reason to change entire language for me :-)
> 
>> The backslashreplace idea may have some merrits in interactive
>> Python sessions or IDLE, but it hides encoding errors in all
>> other situations.
> 
> Encoding errors are not hidden, but are represented by hex-escaped
> strings. We can get much more information about the string being
> printed than printing tracebacks.
> 
>> I'm not against changing the repr() of Unicode objects, but
>> please make sure that this change does not break debugging
>> Python applications.Whether you're debugging an app using
>> 'print' statements, piping repr() through a socket to a remote
>> debugger or writing information to a log file. The important
>> factor to take into account is the other end that will receive
>> the data.
> 
> I think your request is too vague to be completed. This proposal
> improve current broken debugging for me, and I see no lost information
> for debugging. But the "other end" may be too vary to say something.
> 
>> BTW: One problem that your PEP doesn't address, which I mentioned
>> on the ticket:
>>
>> By putting all printable chars into the repr() you lose the
>> ability to actually see the number of code points you have
>> in a Unicode string.
>>
> 
> With current repr(), I can not get any information other than number
> of code points. This is not what I want to know by printing repr().
> For length of the string, I'll just do print(len(s)).
> 
>> Please name the property Py_UNICODE_ISPRINTABLE. Py_UNICODE_ISHEXESCAPED
>> isn't all that intuitive.
> 
> The name `Py_UNICODE_ISPRINTABLE` came to my mind at first, but I was
> not sure the `printable`  is accurate word. I'm okay for
> Py_UNICODE_ISPRINTABLE, but I'd like to hear opinions. If no one
> objects Py_UNICODE_ISPRINTABLE, I'll go for it.
> 
>> How can things easily be changed so that it's possible to get the
>> Py2.x style hex escaping back into Py3k without having to change
>> all repr() calls and %r format markers for Unicode objects ?
> 
> I didn't intend to imply "without having to change".  Perhaps,
> "migrate" would be wrong word and "port" may be better.
> 
> For repr() and %r format, they are unlikely to be changed in most
> case. They need to be changed if pure ASCII are required even if your
> locale is capable to print the strings.
> 
>> I can see your point with it being easier to read e.g. German,
>> Japanese or Korean data, but it still has to be possible to
>> use repr() for proper debugging which allows the user to
>> actually see what is stored in a Unicode object in terms of
>> code points.
> 
> You can see code points easily, the function I wrote in the PEP to
> convert such strings as repr() in Python 2 is good example. But I
> believe ordinary use-case prefer readable string over code points.


From mal at egenix.com  Wed May 14 18:23:20 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 14 May 2008 18:23:20 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
Message-ID: <482B11F8.2090200@egenix.com>

On 2008-05-09 18:06, Guido van Rossum wrote:
> On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 2008-05-08 22:55, Terry Reedy wrote:
>>> Functions that map unicode->unicode or bytes->bytes could be called
>>> transcoders.  Each type could be given a .transcode method to go along with
>>> but contrast with .encode or .decode.
>> Are you suggesting to have two separate methods which then
>> allow same-type-conversions ? One for encoding to the same
>> type and one for decoding ?
>>
>> Fine with me.
>>
>> They do have to map naturally to the codec method encode and
>> decode, though, so a single method won't do, unless maybe
>> you add a parameter to define the direction of the coding
>> process.
>>
>> In summary, I'd just like to see the following happen:
>>
>>  * revert the type restrictions on the PyCodec_* API
>>
>>  * enforce the restrictions on the .encode() and .decode()
>>   methods of PyUnicode and PyString objects (str and bytes)
>>
>>  * add a way to PyUnicode and PyString objects (str and bytes)
>>   to allow same type encoding and decoding
> 
> +1

Fine, so we need new methods for PyUnicode and PyString objects
which allow encoding and decoding using the same type (and enforce
the return types).

Any suggestions ?

How about these:

str.str_encode() -> str
str.str_decode() -> str

bytes.bytes_encode() -> bytes
bytes.bytes_decode() -> bytes

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 14 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From g.brandl at gmx.net  Wed May 14 18:33:41 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 14 May 2008 18:33:41 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B11F8.2090200@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com>
Message-ID: <g0f42q$29h$1@ger.gmane.org>

M.-A. Lemburg schrieb:
> On 2008-05-09 18:06, Guido van Rossum wrote:
>> On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> On 2008-05-08 22:55, Terry Reedy wrote:
>>>> Functions that map unicode->unicode or bytes->bytes could be called
>>>> transcoders.  Each type could be given a .transcode method to go along with
>>>> but contrast with .encode or .decode.
>>> Are you suggesting to have two separate methods which then
>>> allow same-type-conversions ? One for encoding to the same
>>> type and one for decoding ?
>>>
>>> Fine with me.
>>>
>>> They do have to map naturally to the codec method encode and
>>> decode, though, so a single method won't do, unless maybe
>>> you add a parameter to define the direction of the coding
>>> process.
>>>
>>> In summary, I'd just like to see the following happen:
>>>
>>>  * revert the type restrictions on the PyCodec_* API
>>>
>>>  * enforce the restrictions on the .encode() and .decode()
>>>   methods of PyUnicode and PyString objects (str and bytes)
>>>
>>>  * add a way to PyUnicode and PyString objects (str and bytes)
>>>   to allow same type encoding and decoding
>> 
>> +1

Will this get use the hex, base64 etc. "codecs" back? If yes, great!

> Fine, so we need new methods for PyUnicode and PyString objects
> which allow encoding and decoding using the same type (and enforce
> the return types).
> 
> Any suggestions ?
> 
> How about these:
> 
> str.str_encode() -> str
> str.str_decode() -> str
> 
> bytes.bytes_encode() -> bytes
> bytes.bytes_decode() -> bytes

Cool, a naming contest :)

What about transform/untransform?

Georg


From guido at python.org  Wed May 14 18:55:28 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 14 May 2008 09:55:28 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <g0f42q$29h$1@ger.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
Message-ID: <ca471dc20805140955i318a661diae8138275e78461b@mail.gmail.com>

On Wed, May 14, 2008 at 9:33 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>  Will this get use the hex, base64 etc. "codecs" back? If yes, great!

If someone does the work, yes. There will need to be some way to add
metadata to codecs to indicate which of the following they support:
str<->bytes, str<->str, bytes<->bytes.

M.-A. Lemburg schrieb:
> > Fine, so we need new methods for PyUnicode and PyString objects
> > which allow encoding and decoding using the same type (and enforce
> > the return types).
> >
> > Any suggestions ?
> >
> > How about these:
> >
> > str.str_encode() -> str
> > str.str_decode() -> str
> >
> > bytes.bytes_encode() -> bytes
> > bytes.bytes_decode() -> bytes

>  Cool, a naming contest :)
>
>  What about transform/untransform?

+1, anything to avoid having to type underscores.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mal at egenix.com  Wed May 14 19:24:11 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 14 May 2008 19:24:11 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <g0f42q$29h$1@ger.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<g0f42q$29h$1@ger.gmane.org>
Message-ID: <482B203B.3080305@egenix.com>

On 2008-05-14 18:33, Georg Brandl wrote:
> M.-A. Lemburg schrieb:
>>>> In summary, I'd just like to see the following happen:
>>>>
>>>>  * revert the type restrictions on the PyCodec_* API
>>>>
>>>>  * enforce the restrictions on the .encode() and .decode()
>>>>   methods of PyUnicode and PyString objects (str and bytes)
>>>>
>>>>  * add a way to PyUnicode and PyString objects (str and bytes)
>>>>   to allow same type encoding and decoding
>>>
>>> +1
> 
> Will this get use the hex, base64 etc. "codecs" back? If yes, great!

I suppose so :-)

Those would work only work on bytes, though, so to convert
the result into text, you'd have to do:

text = bytes.encodebytes('hex').decode('ascii')
bytes = text.encode('ascii').decodebytes('hex')

>> Fine, so we need new methods for PyUnicode and PyString objects
>> which allow encoding and decoding using the same type (and enforce
>> the return types).
>>
>> Any suggestions ?
>>
>> How about these:
>>
>> str.str_encode() -> str
>> str.str_decode() -> str
>>
>> bytes.bytes_encode() -> bytes
>> bytes.bytes_decode() -> bytes
> 
> Cool, a naming contest :)
> 
> What about transform/untransform?

Not bad :-)

Here's a version without underscores:

str.encodestr() -> str
str.decodestr() -> str

bytes.encodebytes() -> bytes
bytes.decodebytes() -> bytes

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 14 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Wed May 14 19:27:18 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 14 May 2008 19:27:18 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805140955i318a661diae8138275e78461b@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<g0f42q$29h$1@ger.gmane.org>
	<ca471dc20805140955i318a661diae8138275e78461b@mail.gmail.com>
Message-ID: <482B20F6.20706@egenix.com>

On 2008-05-14 18:55, Guido van Rossum wrote:
> On Wed, May 14, 2008 at 9:33 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>>  Will this get use the hex, base64 etc. "codecs" back? If yes, great!
> 
> If someone does the work, yes. There will need to be some way to add
> metadata to codecs to indicate which of the following they support:
> str<->bytes, str<->str, bytes<->bytes.

No problem: we have codecs.CodecInfo to store such information.

We'd just need a way to describe the supported input/output
type combinations in one or more attributes to that structure.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 14 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From martin at v.loewis.de  Wed May 14 19:43:41 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 14 May 2008 19:43:41 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B10DF.50105@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<48232FB2.3020205@egenix.com>	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
Message-ID: <482B24CD.5080206@v.loewis.de>

> Hardwiring the encoding is not a good idea, esp. since there
> are lots of alternatives for you to get readable output from
> PyUnicode object now and without any changes to the interpreter.
> 
> E.g.
> 
> print '%s' % u.encode('utf-8')

We are talking about Python 3 here, so it is fairly important
that you consider all syntactic and semantic details of Python
3 - otherwise it is not clear whether or not you are aware of
them:
- the print syntax is incorrect
- .encode returns a byte string
- therefore, %s applies __str__ to the byte string, yielding
  something like b'...', with hex escapes for the non-ASCII
  bytes

> There are many ways to solve your problem.

No. If you strike out those that don't actually work, close
to none remain.

Regards,
Martin

From jimjjewett at gmail.com  Wed May 14 19:45:10 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 14 May 2008 13:45:10 -0400
Subject: [Python-3000] string API growth [was: Re: PEP 3138- String
	representation in Python 3000]
Message-ID: <fb6fbf560805141045v756640efx5e2f8b58917a5e54@mail.gmail.com>

On 5/14/08, Georg Brandl <g.brandl at gmx.net> wrote:
> M.-A. Lemburg schrieb:
>>> On Fri, May 9, 2008 at 3:54 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 2008-05-08 22:55, Terry Reedy wrote:
>>>>> Functions that map unicode->unicode or bytes->bytes could be called
>>>>> transcoders.

bytes->bytes might be, but for many mappings (and all unicode->unicode
mappings) they are general transformers.

If you care about the concrete representation, then you aren't really
dealing with unicode anymore; you're dealing with the ByteString.

>>>> Are you suggesting to have two separate methods which then
>>>> allow same-type-conversions ?

>>>> ... have to map naturally to the codec method encode and
>>>> decode

For str->str or bytes->bytes, how do you decide which direction is
"en"coding vs "de"coding?

> > How about these:

> > str.str_encode() -> str
> > str.str_decode() -> str

> > bytes.bytes_encode() -> bytes
> > bytes.bytes_decode() -> bytes

>  What about transform/untransform?

Maybe I'm missing something, but it seems to me that there are only a
few logical combinations; if the below is wrong, maybe that is one
reason unicode seems more complex than it should.

Encoding:  str -> ByteString
    (staticmethod) BytesString.encode(my_string, encoding=?)
    ==
    my_string.encode(encoding=?)

Decoding:  ByteString -> str
    my_bytes.decode(encoding=?)
    ==
    (staticmethod) str.decode(my_bytes, encoding=?)

General Transforming:
    # Why insist on type-preservation?
    # Why even make these methods?
    my_string.transform(fn) == fn(my_string)
    my_bytes.transform(fn) == fn(my_bytes)

Transcoding:  ByteString -> ByteString
    # If you care how it is represented, it is no longer unicode;
    # it is a specific (ByteString) representation
    mybytes.recode(old_encoding=?, new_encoding)

    # Can the old encoding often be inferred?
    # Or should it always be written because of EIBTI?

-jJ

From mal at egenix.com  Wed May 14 20:51:22 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 14 May 2008 20:51:22 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B24CD.5080206@v.loewis.de>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<48232FB2.3020205@egenix.com>	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>	<482B10DF.50105@egenix.com>
	<482B24CD.5080206@v.loewis.de>
Message-ID: <482B34AA.9000001@egenix.com>

On 2008-05-14 19:43, Martin v. L?wis wrote:
>> Hardwiring the encoding is not a good idea, esp. since there
>> are lots of alternatives for you to get readable output from
>> PyUnicode object now and without any changes to the interpreter.
>>
>> E.g.
>>
>> print '%s' % u.encode('utf-8')
> 
> We are talking about Python 3 here, so it is fairly important
> that you consider all syntactic and semantic details of Python
> 3 - otherwise it is not clear whether or not you are aware of
> them:
> - the print syntax is incorrect
> - .encode returns a byte string
> - therefore, %s applies __str__ to the byte string, yielding
>   something like b'...', with hex escapes for the non-ASCII
>   bytes

Sorry, I was in Python 2 mode.

For Python 3 you don't need the .encode() calls since the
stream will take care of that for you:

# Let sys.stdout take care of the encoding
print('"%s"' % u.transform('unicode-printable'))

# Log to a file:
logfile = open('my.log', 'a', encoding='unicode-printable')
logfile.write('"%s"' % u)

# Using a helper
def unicode_repr(u):
     return '"' + u.transform('unicode-printable') + '"'
print(unicode_repr(u))

For the purists: the above assumes that 'unicode-printable'
will encode '"' to '\"'.


BTW: I found that

   logfile = open('my.log', 'a', encoding='unicode-printable')

doesn't raise an exception. Only when you call the .write()
method you get the expected:

   LookupError: unknown encoding: unicode-printable

Is that intended ? IMO, such errors should not be deferred.

>> There are many ways to solve your problem.
> 
> No. If you strike out those that don't actually work, close
> to none remain.

They may look a bit different, but the logic is essentially
the same.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 14 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Wed May 14 23:39:00 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 15 May 2008 07:39:00 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B203B.3080305@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com>
Message-ID: <482B5BF4.1090007@gmail.com>

M.-A. Lemburg wrote:
> On 2008-05-14 18:33, Georg Brandl wrote:
>> M.-A. Lemburg schrieb:
>>> Fine, so we need new methods for PyUnicode and PyString objects
>>> which allow encoding and decoding using the same type (and enforce
>>> the return types).
>>>
>>> Any suggestions ?
>>>
>>> How about these:
>>>
>>> str.str_encode() -> str
>>> str.str_decode() -> str
>>>
>>> bytes.bytes_encode() -> bytes
>>> bytes.bytes_decode() -> bytes
>>
>> Cool, a naming contest :)
>>
>> What about transform/untransform?
> 
> Not bad :-)
> 
> Here's a version without underscores:
> 
> str.encodestr() -> str
> str.decodestr() -> str
> 
> bytes.encodebytes() -> bytes
> bytes.decodebytes() -> bytes

A couple more possibilities (Guido is probably going to have to choose a 
colour for this bikeshed somewhere along the line...):

mystr.recodeto('unicode-escaped')
mystr.recodefrom('unicode-escaped')

mybytes.recodeto('hex')
mybytes.recodefrom('hex')

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Wed May 14 23:42:30 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 14 May 2008 14:42:30 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B5BF4.1090007@gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
Message-ID: <ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>

On Wed, May 14, 2008 at 2:39 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> M.-A. Lemburg wrote:
>>
>> On 2008-05-14 18:33, Georg Brandl wrote:
>>>
>>> M.-A. Lemburg schrieb:
>>>>
>>>> Fine, so we need new methods for PyUnicode and PyString objects
>>>> which allow encoding and decoding using the same type (and enforce
>>>> the return types).
>>>>
>>>> Any suggestions ?
>>>>
>>>> How about these:
>>>>
>>>> str.str_encode() -> str
>>>> str.str_decode() -> str
>>>>
>>>> bytes.bytes_encode() -> bytes
>>>> bytes.bytes_decode() -> bytes
>>>
>>> Cool, a naming contest :)
>>>
>>> What about transform/untransform?
>>
>> Not bad :-)
>>
>> Here's a version without underscores:
>>
>> str.encodestr() -> str
>> str.decodestr() -> str
>>
>> bytes.encodebytes() -> bytes
>> bytes.decodebytes() -> bytes
>
> A couple more possibilities (Guido is probably going to have to choose a
> colour for this bikeshed somewhere along the line...):
>
> mystr.recodeto('unicode-escaped')
> mystr.recodefrom('unicode-escaped')
>
> mybytes.recodeto('hex')
> mybytes.recodefrom('hex')

Nah. I'm still in favor of [un]transform. Let's just stick to that.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Thu May 15 02:16:21 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 May 2008 12:16:21 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B11F8.2090200@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com>
Message-ID: <482B80D5.8000202@canterbury.ac.nz>

Wasn't there a big discussion once before about whether
encode/decode should be usable for things other than
unicode<->non-unicode transformations? I thought the
conclusion reached back then was that they shouldn't.

Is there some reason the transformations being talked
about can't just be provided as functions that operate
on strings or bytes?

-- 
Greg

From stephen at xemacs.org  Thu May 15 02:58:15 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 15 May 2008 09:58:15 +0900
Subject: [Python-3000] string API growth [was: Re: PEP 3138-
	String	representation in Python 3000]
In-Reply-To: <fb6fbf560805141045v756640efx5e2f8b58917a5e54@mail.gmail.com>
References: <fb6fbf560805141045v756640efx5e2f8b58917a5e54@mail.gmail.com>
Message-ID: <877idwl6xk.fsf@uwakimon.sk.tsukuba.ac.jp>

Jim Jewett writes:
 > Maybe I'm missing something, but it seems to me that there are only a
 > few logical combinations; 

There are lots of logical combinations, but most of them fall into
"general transform", is that what you mean?

 > if the below is wrong, maybe that is one
 > reason unicode seems more complex than it should.
 > 
 > Encoding:  str -> ByteString
 >     (staticmethod) BytesString.encode(my_string, encoding=?)
 >     ==
 >     my_string.encode(encoding=?)
 > 
 > Decoding:  ByteString -> str
 >     my_bytes.decode(encoding=?)
 >     ==
 >     (staticmethod) str.decode(my_bytes, encoding=?)

+1

 > General Transforming:
 >     # Why insist on type-preservation?
 >     # Why even make these methods?
 >     my_string.transform(fn) == fn(my_string)
 >     my_bytes.transform(fn) == fn(my_bytes)

Make them methods if they are "like" codecs, by which I mean something
like (more or less) invertible stream-oriented transformations.  Eg,

    my_bytes.gzip()

Pretty weak, though.

 > Transcoding:  ByteString -> ByteString
 >     # If you care how it is represented, it is no longer unicode;
 >     # it is a specific (ByteString) representation
 >     mybytes.recode(old_encoding=?, new_encoding)
 > 
 >     # Can the old encoding often be inferred?
 >     # Or should it always be written because of EIBTI?

(1) I agree this is the obvious connotation of "transcode" in the
    codec context.

(2) This usage is too special to deserve treatment at this level,
    especially since for most purposes

    my_bytes.decode(old_encoding).encode(new_encoding)

    will be perfectly sufficient.

(3) old_encoding should not be inferred as part of .decode() or
    .recode(), as such inference is unreliable and domain-specific
    heuristics often lead to great improvements.  A separate
    method/function should be used.

From guido at python.org  Thu May 15 03:27:56 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 14 May 2008 18:27:56 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B80D5.8000202@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
Message-ID: <ca471dc20805141827g4da50313u4c704e244c5fde67@mail.gmail.com>

On Wed, May 14, 2008 at 5:16 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Wasn't there a big discussion once before about whether
> encode/decode should be usable for things other than
> unicode<->non-unicode transformations? I thought the
> conclusion reached back then was that they shouldn't.

That was before the idea was brought up to have separate APIs for the
X<->X transforms. The reason to drop those was making the type
signatures of .encode() and .decode() predictable, which is much more
of a concern in 3.0 than it is in 2.x where it's basically string in,
string out and whether that's unicode of 8-bit is a minor detail (in
some cases at least).

> Is there some reason the transformations being talked
> about can't just be provided as functions that operate
> on strings or bytes?

Several people have explained that having these available as
transformations and being able to register new transformations is very
convenient; there is plenty of existing use in 2.x of this feature, so
we're not inventing something new.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From stephen at xemacs.org  Thu May 15 05:00:49 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 15 May 2008 12:00:49 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B80D5.8000202@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
Message-ID: <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > Wasn't there a big discussion once before about whether
 > encode/decode should be usable for things other than
 > unicode<->non-unicode transformations? I thought the
 > conclusion reached back then was that they shouldn't.

That group prevailed, but it was more like a WBA title bout ... here's
the rematch.  This one won't "prove" anything either.<wink>

 > Is there some reason the transformations being talked
 > about can't just be provided as functions that operate
 > on strings or bytes?

This discussion isn't about whether it could be done or not, it's
about where people expect to find such functionality.  Personally, if
I can find .encode('euc-jp') on a string object, I would expect to
find .encode('gzip') on a bytes object, too.

I think this one is just going to come down to BDFL pronouncement
about which is more Pythonic, because I don't really see either point
of view as more "natural".

From guido at python.org  Thu May 15 05:22:12 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 14 May 2008 20:22:12 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>

On Wed, May 14, 2008 at 8:00 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> This discussion isn't about whether it could be done or not, it's
> about where people expect to find such functionality.  Personally, if
> I can find .encode('euc-jp') on a string object, I would expect to
> find .encode('gzip') on a bytes object, too.

The argument against reusing the same method name is that in 3.0 we
need to keep bytes and str instances separate more carefully than we
did in 2.x. Consider code that gets an encoding passed in as a
variable e. It knows it has a bytes instance b. To encode b from bytes
to str (unicode), it can use s = b.decode(e). It can then treat s as a
string, e.g. write it to a text file or pass it to a text processing
class. If the possibility existed that the result was actually a bytes
instance (e.g. when e == 'gzip' instead of e == 'euc-jp') this would
either cause the code to break subtly in the field, or it would
require the programmer do an additional type check on s before using
it. (And I know quite a few programmers who would feel obliged to
handle this case.)

Of course the possibility always exists that e is not a valid encoding
at all; but that case raises a predictable exception. Similar in the
case that b can't be decoded using e. Having something be a valid
encoding but return an unusable result is much more problematic.

> I think this one is just going to come down to BDFL pronouncement
> about which is more Pythonic, because I don't really see either point
> of view as more "natural".

It's mostly settled. There will be separate methods to transform bytes
to bytes and to transform str to str, and these will use separate
collections of encodings. (Or perhaps some codecs will apply to
multiple cases, e.g. rot13 might apply both for str<->str and for
bytes<->bytes; but I'd expect gzip to apply only for bytes<->bytes.)
There will be metadata on the codecs so that b.decode("gzip") will
raise an exception just as b.transform("utf-8") will. The details
haven't all been sorted out but so far the only names proposed that I
like are transform() and untransform(). I propose that
b.transform("gzip") would compress and b.untransform("gzip") would
uncompress. I'm fine with the str and bytes methods both being called
transform() and untransform() -- this is no different than the current
situation with e.g. lower() and upper().

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Thu May 15 10:13:26 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 15 May 2008 20:13:26 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <482BF0A6.70602@canterbury.ac.nz>

Stephen J. Turnbull wrote:
> This discussion isn't about whether it could be done or not, it's
> about where people expect to find such functionality.  Personally, if
> I can find .encode('euc-jp') on a string object, I would expect to
> find .encode('gzip') on a bytes object, too.

What I'm not seeing is a clear rationale on where you
draw the line. Out of all the possible transformations
between a string and some other kind of data, which
ones deserve to be available via this rather strange
and special interface, and why?

-- 
Greg

From stephen at xemacs.org  Thu May 15 11:12:14 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Thu, 15 May 2008 18:12:14 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482BF0A6.70602@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz>
Message-ID: <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > What I'm not seeing is a clear rationale on where you
 > draw the line. Out of all the possible transformations
 > between a string and some other kind of data, which
 > ones deserve to be available via this rather strange
 > and special interface, and why?

I don't know nuthin about just desserts.

As I wrote earlier in response to Jim, what I would *expect* to be
provided by this interface (not necessarily named "encode" and
"decode", but invoked as a method with a transformation name as
parameter) are those transformations that are "like codecs":
stream-oriented and invertible.

From mal at egenix.com  Thu May 15 11:48:55 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 15 May 2008 11:48:55 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<g0f42q$29h$1@ger.gmane.org>	<482B203B.3080305@egenix.com>
	<482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
Message-ID: <482C0707.8020805@egenix.com>

On 2008-05-14 23:42, Guido van Rossum wrote:
> On Wed, May 14, 2008 at 2:39 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> M.-A. Lemburg wrote:
>>> On 2008-05-14 18:33, Georg Brandl wrote:
>>>> M.-A. Lemburg schrieb:
>>>>> Fine, so we need new methods for PyUnicode and PyString objects
>>>>> which allow encoding and decoding using the same type (and enforce
>>>>> the return types).
>>>>>
>>>>> Any suggestions ?
>>>>>
>>>>> How about these:
>>>>>
>>>>> str.str_encode() -> str
>>>>> str.str_decode() -> str
>>>>>
>>>>> bytes.bytes_encode() -> bytes
>>>>> bytes.bytes_decode() -> bytes
>>>> Cool, a naming contest :)
>>>>
>>>> What about transform/untransform?
>>> Not bad :-)
>>>
>>> Here's a version without underscores:
>>>
>>> str.encodestr() -> str
>>> str.decodestr() -> str
>>>
>>> bytes.encodebytes() -> bytes
>>> bytes.decodebytes() -> bytes
>> A couple more possibilities (Guido is probably going to have to choose a
>> colour for this bikeshed somewhere along the line...):
>>
>> mystr.recodeto('unicode-escaped')
>> mystr.recodefrom('unicode-escaped')
>>
>> mybytes.recodeto('hex')
>> mybytes.recodefrom('hex')
> 
> Nah. I'm still in favor of [un]transform. Let's just stick to that.

Ok, so I'll add

str.transform() -> str     (uses the encode function of the codec)
str.untransform() -> str   (uses the decode function of the codec)

bytes.transform() -> bytes   (uses the encode function of the codec)
bytes.untransform() -> bytes (uses the decode function of the codec)

Is there an easy way to SVN-revive the removed base64, hex, etc
codec modules in encodings ? As far as I remember, I have to look
for the revision just before they were deleted and then "copy" them
from there using the repo URL.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 15 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From p.f.moore at gmail.com  Thu May 15 12:06:56 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 15 May 2008 11:06:56 +0100
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
Message-ID: <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>

On 15/05/2008, Guido van Rossum <guido at python.org> wrote:
> Consider code that gets an encoding passed in as a
> variable e. It knows it has a bytes instance b. To encode b from bytes
> to str (unicode), it can use s = b.decode(e).

To encode, you use .decode? It's nice to know it's not just me who has
trouble keeping the terminology straight...

Paul.

From mal at egenix.com  Thu May 15 12:22:45 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 15 May 2008 12:22:45 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>
	<87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <482C0EF5.3070205@egenix.com>

On 2008-05-15 11:12, Stephen J. Turnbull wrote:
> Greg Ewing writes:
> 
>  > What I'm not seeing is a clear rationale on where you
>  > draw the line. Out of all the possible transformations
>  > between a string and some other kind of data, which
>  > ones deserve to be available via this rather strange
>  > and special interface, and why?
> 
> I don't know nuthin about just desserts.
> 
> As I wrote earlier in response to Jim, what I would *expect* to be
> provided by this interface (not necessarily named "encode" and
> "decode", but invoked as a method with a transformation name as
> parameter) are those transformations that are "like codecs":
> stream-oriented and invertible.

str.transform(encoding) will use the standard codecs.encode(encoding),
but additionally check that the output has the type str and raise
an error if it doesn't.

Dito for .untransform(encoding).

For bytes, the methods will check that the output has type bytes and
raise an error if it doesn't.

The methods could also check the meta-data on the found codecs
before actually running the transformation, but that may not
always lead to usable results, e.g. if a codec can handle both
str->str and bytes->bytes by doing the type check itself.

In any case, the above type checks will always happen to not cause
unexpected results.

I'll write up a PEP once we have a better understanding of the
details, e.g. of how the codec type information should be
defined...

Here's a straight-forward approach:

codecinfo.encode_type_combinations = [(bytes, bytes), (str, str)]
codecinfo.decode_type_combinations = [(bytes, bytes), (str, str)]

for most codecs (e.g. utf-8, latin-1, cp850, etc.) this would
then be:

codecinfo.encode_type_combinations = [(str, bytes)]
codecinfo.decode_type_combinations = [(bytes, str)]

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 15 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Thu May 15 12:34:32 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 15 May 2008 20:34:32 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482BF0A6.70602@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz>
Message-ID: <482C11B8.3010505@gmail.com>

Greg Ewing wrote:
> Stephen J. Turnbull wrote:
>> This discussion isn't about whether it could be done or not, it's
>> about where people expect to find such functionality.  Personally, if
>> I can find .encode('euc-jp') on a string object, I would expect to
>> find .encode('gzip') on a bytes object, too.
> 
> What I'm not seeing is a clear rationale on where you
> draw the line. Out of all the possible transformations
> between a string and some other kind of data, which
> ones deserve to be available via this rather strange
> and special interface, and why?
> 

Where this kind of unified interface to binary and character transforms 
is incredibly handy is in a stacking IO model like the one used in Py3k. 
For example, suppose you're using a compressed XML stream to communicate 
over a network socket. What this approach allows you to do is have 
generic 'transformation' layers in your IO stack, so you can just build 
up your IO stack as something like:

XMLParserIO('myschema')
BufferedTextIO('utf-8')
BytesTransform('gzip')
RawSocketIO

To change to a different compression mechanism (e.g. bz2), you just 
chance the codec used by the BytesTransform layer from 'gzip' to 'bz2'.

As for how you choose what to provide as codecs... well, that's a major 
reason why the codec registry is extensible. The answer is that any 
binary or character transform which is useful to the application 
programmer can be accessed via the codec API - the only question will be 
whether the application programmer will have to write the codec 
themselves, or will find it already provided in the standard library.

Cheers,
Nick.

P.S. My original tangential response that didn't actually answer your 
question, but may still be useful to some folks:

An actual codec that encodes a character string to a byte sequence, and 
decodes a byte sequence back to a character string would be invoked via 
the str.encode() and bytes.decode() methods. For example, 
mystr.encode('utf-8') to serialise a string using UTF-8, 
mybytes.decode('utf-8') to read it back.

A text transform that converts a character string to a different 
character string would be invoked via the str.transform() and 
str.untransform() methods. For example, 
mystr.transform('unicode-escape') to convert unicode characters to their 
\u or \U equivalents, mystr.untransform('unicode-escape') to convert 
them back to the actual unicode characters.

A binary transform that converts a byte sequence to a different byte 
sequence would be invoked via the bytes.transform() and 
bytes.untransform() methods. For example, mybytes.transform('gzip') to 
compress a byte sequence, mybytes.untransform('gzip') to decompress it.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From mal at egenix.com  Thu May 15 12:38:11 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 15 May 2008 12:38:11 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
Message-ID: <482C1293.3030409@egenix.com>

On 2008-05-15 12:06, Paul Moore wrote:
> On 15/05/2008, Guido van Rossum <guido at python.org> wrote:
>> Consider code that gets an encoding passed in as a
>> variable e. It knows it has a bytes instance b. To encode b from bytes
>> to str (unicode), it can use s = b.decode(e).
> 
> To encode, you use .decode? It's nice to know it's not just me who has
> trouble keeping the terminology straight...

It's all a matter of perspective. You can say you're encoding Latin-1
to Unicode, or you can say your encoding Unicode to Latin-1.

Python's Unicode implementation regards PyUnicode as the "bigger" type
than PyString (*), since it can hold all possible code points, so when
going from the "bigger" type to the smaller one, you *encode*, whereas
when going from the smaller one to the bigger one, you *decode*.

For codecs in general, you have a source and a destination defining
the codec (= coding / decoding). When going from the source to the
destination you *encode*, the other way around is *decoding*.

(*) This is why coercion in Py2 goes from PyString to PyUnicode and
not the other way around.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 15 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Thu May 15 13:01:41 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 15 May 2008 21:01:41 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482C0EF5.3070205@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482C0EF5.3070205@egenix.com>
Message-ID: <482C1815.1050008@gmail.com>

M.-A. Lemburg wrote:
> I'll write up a PEP once we have a better understanding of the
> details, e.g. of how the codec type information should be
> defined...
> 
> Here's a straight-forward approach:
> 
> codecinfo.encode_type_combinations = [(bytes, bytes), (str, str)]
> codecinfo.decode_type_combinations = [(bytes, bytes), (str, str)]
> 
> for most codecs (e.g. utf-8, latin-1, cp850, etc.) this would
> then be:
> 
> codecinfo.encode_type_combinations = [(str, bytes)]
> codecinfo.decode_type_combinations = [(bytes, str)]

Do we need something that flexible? Would a simpler approach with 
separate "binary_transform" and "text_transform" flags be enough?

With the latter approach, the encode()/decode() methods could complain 
if either of the transform flags was set on the codec, while the 
transform()/untransform() methods could complain if the appropriate 
transform flag *wasn't* set.

Note also that both bytearray and bytes provide decode() methods, and 
will presumably provide transform() methods, so actual type annotations 
may not be the best way to go about this.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From mal at egenix.com  Thu May 15 13:48:40 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 15 May 2008 13:48:40 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482C1815.1050008@gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp>	<482C0EF5.3070205@egenix.com>
	<482C1815.1050008@gmail.com>
Message-ID: <482C2318.50205@egenix.com>

On 2008-05-15 13:01, Nick Coghlan wrote:
> M.-A. Lemburg wrote:
>> I'll write up a PEP once we have a better understanding of the
>> details, e.g. of how the codec type information should be
>> defined...
>>
>> Here's a straight-forward approach:
>>
>> codecinfo.encode_type_combinations = [(bytes, bytes), (str, str)]
>> codecinfo.decode_type_combinations = [(bytes, bytes), (str, str)]
>>
>> for most codecs (e.g. utf-8, latin-1, cp850, etc.) this would
>> then be:
>>
>> codecinfo.encode_type_combinations = [(str, bytes)]
>> codecinfo.decode_type_combinations = [(bytes, str)]
> 
> Do we need something that flexible? Would a simpler approach with 
> separate "binary_transform" and "text_transform" flags be enough?
> 
> With the latter approach, the encode()/decode() methods could complain 
> if either of the transform flags was set on the codec, while the 
> transform()/untransform() methods could complain if the appropriate 
> transform flag *wasn't* set.

The above is a mechanism for codecs which do have a very
flexible interface in terms of supported types.

The methods on various objects are just convenience helpers
for easier access and in Py3k also provide type-safety.

The .transform() methods would simply check for the corresponding
type combination, ie. str.transform() would check for (str, str).
str.encode() would check for (str, bytes), bytes.decode() for
(bytes, str).

Alternatively, we could just not check the type combinations
at all and only apply the result type check.

> Note also that both bytearray and bytes provide decode() methods, and 
> will presumably provide transform() methods, so actual type annotations 
> may not be the best way to go about this.

I'm not sure I understand.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 15 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Thu May 15 15:42:24 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 15 May 2008 23:42:24 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482C2318.50205@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp>	<482C0EF5.3070205@egenix.com>
	<482C1815.1050008@gmail.com> <482C2318.50205@egenix.com>
Message-ID: <482C3DC0.4020600@gmail.com>

M.-A. Lemburg wrote:
> The .transform() methods would simply check for the corresponding
> type combination, ie. str.transform() would check for (str, str).
> str.encode() would check for (str, bytes), bytes.decode() for
> (bytes, str).
> 
> Alternatively, we could just not check the type combinations
> at all and only apply the result type check.
> 
>> Note also that both bytearray and bytes provide decode() methods, and 
>> will presumably provide transform() methods, so actual type 
>> annotations may not be the best way to go about this.
> 
> I'm not sure I understand.

If we went with the approach of checking type annotations on the codec, 
then would a codec which was only annotated with (bytes, str) on the 
decode method be usable by bytearray.decode()?

And if we aren't going to check the type annotations before invoking the 
codec, what's the point in having them at all? Better to leave them out 
entirely, invoke the relevant method of the named codec and see if we 
get the right type back.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From mal at egenix.com  Thu May 15 16:22:13 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 15 May 2008 16:22:13 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482C3DC0.4020600@gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<87prroylqp.fsf@uwakimon.sk.tsukuba.ac.jp>	<482C0EF5.3070205@egenix.com>	<482C1815.1050008@gmail.com>
	<482C2318.50205@egenix.com> <482C3DC0.4020600@gmail.com>
Message-ID: <482C4715.3040906@egenix.com>

On 2008-05-15 15:42, Nick Coghlan wrote:
> M.-A. Lemburg wrote:
>> The .transform() methods would simply check for the corresponding
>> type combination, ie. str.transform() would check for (str, str).
>> str.encode() would check for (str, bytes), bytes.decode() for
>> (bytes, str).
>>
>> Alternatively, we could just not check the type combinations
>> at all and only apply the result type check.
>>
>>> Note also that both bytearray and bytes provide decode() methods, and 
>>> will presumably provide transform() methods, so actual type 
>>> annotations may not be the best way to go about this.
>>
>> I'm not sure I understand.
> 
> If we went with the approach of checking type annotations on the codec, 
> then would a codec which was only annotated with (bytes, str) on the 
> decode method be usable by bytearray.decode()?

Probably not, but the suggested form allows adding (bytearray, str)
if the codec support this as well and bytearray.decode() could check
for that combination.

> And if we aren't going to check the type annotations before invoking the 
> codec, what's the point in having them at all? 

They provide meta-information about the codec capabilities and
may be useful in other contexts as well, e.g. if you want to
add an .encode() method to some other object.

>  Better to leave them out
> entirely, invoke the relevant method of the named codec and see if we 
> get the right type back.

That's an option, yes.

OTOH, if you first decode a 100MB data string
using e.g. gzip and then find that the return type doesn't match
what you had expected, the added global warming due to wasted
CPU heat is going to make you feel rather uncomfortable :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 15 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From param at cs.wisc.edu  Thu May 15 17:06:34 2008
From: param at cs.wisc.edu (Paramjit Oberoi)
Date: Thu, 15 May 2008 08:06:34 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482335CE.7000309@egenix.com> <fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
Message-ID: <e443ad0e0805150806j60d63ec6k8fc735d1d9ba79bc@mail.gmail.com>

On Thu, May 15, 2008 at 3:06 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15/05/2008, Guido van Rossum <guido at python.org> wrote:
>> Consider code that gets an encoding passed in as a
>> variable e. It knows it has a bytes instance b. To encode b from bytes
>> to str (unicode), it can use s = b.decode(e).
>
> To encode, you use .decode? It's nice to know it's not just me who has
> trouble keeping the terminology straight...

It takes a lot of effort, and constant vigilance, to keep
encode/decode straight in one's head.  Maybe this means they need to
be renamed to something like tobytes() and tostring()?

tostring() is probably not the best choice though - too much baggage from java.

-param

From ishimoto at gembook.org  Thu May 15 18:13:01 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Fri, 16 May 2008 01:13:01 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482B10DF.50105@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
Message-ID: <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>

On Thu, May 15, 2008 at 1:18 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> Atuso
>
> you are not really addressing my arguments in your reply.
>
> My main concern is that repr(unicode) as well as '%r' is used
> a lot in logging and debugging of applications.
>
> In the 2.x series of Python, the output of repr() has traditionally
> always been plain ASCII and does not require any special encoding
> and also doesn't run into problems when mixing the output with
> other encodings used in the log file, on the console or whereever
> the output of repr() is sent.
>
> You are now suggesting to break this convention by allowing
> all printable code points to be used in the repr() output.
> Depending on where you send the repr() output and the contents
> of the PyUnicode object, this will likely result in exceptions
> in the .write() method of the stream object.
>

I can't understand why Python 3000 should stick to ASCII repr(). If
your concern is about output, it should be addressed by file object on
printing. The repr() generates text information about an object, and
file encode the text for user's environment on output. This is
straight forward, flexible and common pattern for the Unicode
applications.

> Just adjusting sys.stdout and sys.stderr to prevent them from
> falling over is not enough (and is indeed not within the scope
> of the PEP, since those changes are *major* and not warranted
> for just getting your Unicode repr() to work). repr() is very
> often written to log files and those would all have to be
> changed as well.
>

For other files than sys.std*, I see no problem with::

log = open(filename, errors='backslashreplace').
log.write("%r" % obj)

Although I prefer to 'backslashreplace' as default value for errors.

>  - Are there alternative ways to get the "problem" fixed ?
>  - Is the added convenience worth breaking existing conventions ?

I would like to call it "improve", not break :)

>  - Is it worth breaking existing applications ?

I guess number of applications broken by this change would be small,
and fix would be easy.
So I think worth it, and perhaps a lot of programmers in the non-Latin
countries might think so, too. Apparently, this PEP brought you
concern without any benefit. But this PEP is necessary to make the
most of Unicode's ability for debugging and logging.

>
> I've suggested making the repr() output configurable to address
> the convenience aspect of your proposal. You could then set the
> output encoding to e.g. "unicode-printable" and get your preferred
> output. The default could remain set to the current all-ASCII output.
>

I'm sorry, I cannot understand what "unicode-printable" codec does.
Could you please explain it?

I don't like to make repr() adjustable(I presume you mean to make
unicode_repr() in the Modules/unicodeobject.c adjustable), because old
repr() convention remains intact. Third party applications or
libraries could be failed when I use my custom repr() function.

From p.f.moore at gmail.com  Thu May 15 18:49:06 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 15 May 2008 17:49:06 +0100
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
	<797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
Message-ID: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>

On 15/05/2008, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
> I would like to call it "improve", not break :)

Please can you help me understand the impact here. I am running
Windows XP (UK English - console code page 850, which is some variety
of Latin 1). Currently, printing non-latin1 characters gives me an
exception: for example,

>>> print("Hello\u03C8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Apps\Python30\lib\io.py", line 1103, in write
    b = s.encode(self._encoding)
  File "D:\Apps\Python30\lib\encodings\cp850.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character '\u03c8' in
position 5: character maps to <undefined>

(This is 3.0a1 - I don't know if much has changed in more recent
alphas, if it's significant I can upgrade and try again).

Can you explain what I need to change to make sys.stdout behave as you
propose? If you can do that, I can test what I will see in your
proposal if I type print(repr("Hello\u03C8")). My suspicion is that I
will see unreadable garbage, rather than what I currently get, which
is backslash-escaped, but readable.

The key point here is that I don't think you're proposing to detect
the user's display capabilities and adapt the output to match, so if
my display can't cope with the full Unicode character set, I'll have
to make manual adjustments or see broken output.

Like it or not, a large proportion of Python's users still work in
environments where much of the Unicode character space is not
displayed readably.

My apologies if I misunderstood your proposal - I have almost no
Unicode experience, and that probably shows :-)

Paul.

From p.f.moore at gmail.com  Thu May 15 18:53:39 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 15 May 2008 17:53:39 +0100
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
	<797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
	<79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
Message-ID: <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com>

On 15/05/2008, Paul Moore <p.f.moore at gmail.com> wrote:
> My apologies if I misunderstood your proposal - I have almost no
> Unicode experience, and that probably shows :-)

One point I forgot to clarify is that I'm fully aware that
print(arbitrary_string) may display garbage, if the string contains
Unicode that my display can't handle. The key point for me is that
print(repr(arbitrary_string)) is *guaranteed* to display correctly,
even on my limited-capability terminal, precisely because it only uses
ASCII and no matter how dumb, all terminals I know of display ASCII.

Paul.

From phd at phd.pp.ru  Thu May 15 19:03:32 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Thu, 15 May 2008 21:03:32 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
	<797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
	<79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
Message-ID: <20080515170332.GA9117@phd.pp.ru>

On Thu, May 15, 2008 at 05:49:06PM +0100, Paul Moore wrote:
> Like it or not, a large proportion of Python's users still work in
> environments where much of the Unicode character space is not
> displayed readably.

   How large is that "large proportion"? 10%? 50%? 90%? How often users
working in ascii-only environment are confronted with non-ascii strings?

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From phd at phd.pp.ru  Thu May 15 19:06:02 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Thu, 15 May 2008 21:06:02 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
	<797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
	<79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
	<79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com>
Message-ID: <20080515170602.GB9117@phd.pp.ru>

On Thu, May 15, 2008 at 05:53:39PM +0100, Paul Moore wrote:
> One point I forgot to clarify is that I'm fully aware that
> print(arbitrary_string) may display garbage, if the string contains
> Unicode that my display can't handle. The key point for me is that
> print(repr(arbitrary_string)) is *guaranteed* to display correctly,
> even on my limited-capability terminal, precisely because it only uses
> ASCII and no matter how dumb, all terminals I know of display ASCII.

   That's up to print() or any other output device to decide, not to
repr(). If I send repr() from a CGI back to the browser it doesn't matter
if the server is ascii-only, it only matters if the browser can display
unicode.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From ishimoto at gembook.org  Thu May 15 19:50:22 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Fri, 16 May 2008 02:50:22 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
	<797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
	<79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
Message-ID: <797440730805151050g472d947r18e8f7c7d520d44e@mail.gmail.com>

On Fri, May 16, 2008 at 1:49 AM, Paul Moore <p.f.moore at gmail.com> wrote:
> On 15/05/2008, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
>> I would like to call it "improve", not break :)
>
> Please can you help me understand the impact here. I am running
> Windows XP (UK English - console code page 850, which is some variety
> of Latin 1). Currently, printing non-latin1 characters gives me an
> exception: for example,
>
>>>> print("Hello\u03C8")
> Traceback (most recent call last):
>  File "<stdin>", line 1, in <module>
>  File "D:\Apps\Python30\lib\io.py", line 1103, in write
>    b = s.encode(self._encoding)
>  File "D:\Apps\Python30\lib\encodings\cp850.py", line 12, in encode
>    return codecs.charmap_encode(input,errors,encoding_map)
> UnicodeEncodeError: 'charmap' codec can't encode character '\u03c8' in
> position 5: character maps to <undefined>
>
> (This is 3.0a1 - I don't know if much has changed in more recent
> alphas, if it's significant I can upgrade and try again).
>
> Can you explain what I need to change to make sys.stdout behave as you
> propose? If you can do that, I can test what I will see in your
> proposal if I type print(repr("Hello\u03C8")). My suspicion is that I
> will see unreadable garbage, rather than what I currently get, which
> is backslash-escaped, but readable.

With my proposal, print("Hello\u03C8") prints "Hello\u03C8" instead of
raising an exception. And print(repr("Hello\u03C8")) prints
"'Hello\u03C8'", so no garbage are printed.

Now, let's say you are Greek and working on Greek version of XP.
print("Hello\u03C8") prints "Hello"+collect Greek character(GREEK
SMALL LETTER PSI). And print(repr("Hello\u03C8")) prints
"'Hello"+collect Greek character+"'". If you have Greek font, you can
try this if you swich your command prompt by "chcp 1253"  (change
codepage to 1253) on your command prompt.

>
> The key point here is that I don't think you're proposing to detect
> the user's display capabilities and adapt the output to match, so if
> my display can't cope with the full Unicode character set, I'll have
> to make manual adjustments or see broken output.
>
Python detects user's capabilities, since Python 2.x(or 1.6? I forgot.)
On Windows, Python detects user's encoding from codepage. On Unix,
locale is used to detect encoding.

> Like it or not, a large proportion of Python's users still work in
> environments where much of the Unicode character space is not
> displayed readably.
>

I agree. So rejecting my proposal as "Not common use-case" might be
reasonable. But I should argue to get sympathy, anyway:).

> One point I forgot to clarify is that I'm fully aware that
> print(arbitrary_string) may display garbage, if the string contains
> Unicode that my display can't handle. The key point for me is that
> print(repr(arbitrary_string)) is *guaranteed* to display correctly,
> even on my limited-capability terminal, precisely because it only uses
> ASCII and no matter how dumb, all terminals I know of display ASCII.

I can understand your aware. Perhaps you don't want see your terminal
flash by escape sequence, beep, endless graphic characters, etc. For
legacy byte-string applications(whether written in C or Python),
printing arbitrary string can cause such mess. But this is unlikely to
happen by printing the Unicode string, since the characters your
terminal cannot understand will be escaped or be converted to
character such as '?'.

Hope this helps.

From ncoghlan at gmail.com  Fri May 16 00:56:44 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 16 May 2008 08:56:44 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<48232FB2.3020205@egenix.com>	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>	<482B10DF.50105@egenix.com>	<797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
	<79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
Message-ID: <482CBFAC.9040004@gmail.com>

Paul Moore wrote:
> On 15/05/2008, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
>> I would like to call it "improve", not break :)
> 
> Please can you help me understand the impact here. I am running
> Windows XP (UK English - console code page 850, which is some variety
> of Latin 1). Currently, printing non-latin1 characters gives me an
> exception: for example,

As Oleg and Atsuo already pointed out, this is addressed in the PEP by 
switching the encoding error mode on sys.stderr and sys.stdout to 
backslashreplace instead of the current strict.

So not only will repr() still display correctly for you, all other 
strings containing Unicode characters will start displaying as well 
(with Unicode escapes in place of the glyphs your display can't cope with).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Fri May 16 01:30:29 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 May 2008 11:30:29 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482C0707.8020805@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com>
Message-ID: <482CC795.1050405@canterbury.ac.nz>

M.-A. Lemburg wrote:
> str.transform() -> str     (uses the encode function of the codec)
> str.untransform() -> str   (uses the decode function of the codec)

Not sure I like those names. It's rather unclear which
direction is "transform" and which is "untransform".

People seem to have trouble enough with "encode" and
"decode", but at least there's a clear definition of
that from Unicode-land, and there's the type difference
to catch the mistake if you get it wrong.

Since both ends have the same type here, it's more
important to find unambiguous names if possible.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Fri May 16 01:36:42 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 May 2008 11:36:42 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482C11B8.3010505@gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com>
Message-ID: <482CC90A.5010907@canterbury.ac.nz>

Nick Coghlan wrote:
> What this approach allows you to do is have 
> generic 'transformation' layers in your IO stack, so you can just build 
> up your IO stack as something like:
> 
> XMLParserIO('myschema')
> BufferedTextIO('utf-8')
> BytesTransform('gzip')
> RawSocketIO

There's nothing wrong with that, but what it doesn't
answer is why it's not sufficient just to do things
like

from gzip import gzip_codec
stream2 = BytesTransform(gzip_codec, stream1)

i.e. why there has to be a special kind of namespace
for codecs.

-- 
Greg

From guido at python.org  Fri May 16 01:46:31 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 15 May 2008 16:46:31 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482CC795.1050405@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
Message-ID: <ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>

On Thu, May 15, 2008 at 4:30 PM, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> M.-A. Lemburg wrote:
>>
>> str.transform() -> str     (uses the encode function of the codec)
>> str.untransform() -> str   (uses the decode function of the codec)
>
> Not sure I like those names. It's rather unclear which
> direction is "transform" and which is "untransform".
>
> People seem to have trouble enough with "encode" and
> "decode", but at least there's a clear definition of
> that from Unicode-land, and there's the type difference
> to catch the mistake if you get it wrong.
>
> Since both ends have the same type here, it's more
> important to find unambiguous names if possible.

Really? Don't you think it's pretty obvious that b.transform("gzip")
compresses and b.untransform("gzip") decompresses? Or that
b.transform("base64") generates base64 format?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From greg.ewing at canterbury.ac.nz  Fri May 16 03:39:54 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 May 2008 13:39:54 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48232FB2.3020205@egenix.com>
	<797440730805091023p75f64540r9071d77247d155fd@mail.gmail.com>
	<482B10DF.50105@egenix.com>
	<797440730805150913h78ddde9bk435959b33a6987ee@mail.gmail.com>
	<79990c6b0805150949k29d95adfn9fb30f54df618d55@mail.gmail.com>
	<79990c6b0805150953w226c6e93r14d4888b47fa9ab@mail.gmail.com>
Message-ID: <482CE5EA.1020504@canterbury.ac.nz>

Paul Moore wrote:
> The key point for me is that
> print(repr(arbitrary_string)) is *guaranteed* to display correctly,
> even on my limited-capability terminal, precisely because it only uses
> ASCII and no matter how dumb, all terminals I know of display ASCII.

That still sounds like something that the I/O object
connected to the terminal should deal with. You'll
have the same problem with any other unicode output
that ends up going to the terminal, so it has to
deal with it anyway.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Fri May 16 04:46:21 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 May 2008 14:46:21 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
Message-ID: <482CF57D.6010200@canterbury.ac.nz>

Guido van Rossum wrote:

> Really? Don't you think it's pretty obvious that b.transform("gzip")
> compresses and b.untransform("gzip") decompresses? Or that
> b.transform("base64") generates base64 format?

Well, maybe. I think the problem is that the word
"transform" is inherently direction-neutral, and it
only becomes obvious that you have a direction in
mind for it when you pair it with some invention
such as "untransform".

Maybe it's not all that bad, but it just seems
like it should be possible to do better than picking
a very general word like "transform" and giving
it our own special meaning.

-- 
Greg


From alexandre at peadrop.com  Fri May 16 05:40:17 2008
From: alexandre at peadrop.com (Alexandre Vassalotti)
Date: Thu, 15 May 2008 23:40:17 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482CF57D.6010200@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
	<482CF57D.6010200@canterbury.ac.nz>
Message-ID: <acd65fa20805152040h75f79e34p17ac19248929c670@mail.gmail.com>

On Thu, May 15, 2008 at 10:46 PM, Greg Ewing
<greg.ewing at canterbury.ac.nz> wrote:
> Guido van Rossum wrote:
>
>> Really? Don't you think it's pretty obvious that b.transform("gzip")
>> compresses and b.untransform("gzip") decompresses? Or that
>> b.transform("base64") generates base64 format?
>
> Well, maybe. I think the problem is that the word
> "transform" is inherently direction-neutral, and it
> only becomes obvious that you have a direction in
> mind for it when you pair it with some invention
> such as "untransform".

Me, I have don't a problem with inventing a new word. It is true that
it would be slightly more appropriate to say "inverse_transform", but
that would be awful to type.

Personally, I find the meaning of transform/untransform intuitive, but
that's just me.

-- Alexandre

From tjreedy at udel.edu  Fri May 16 07:21:23 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Fri, 16 May 2008 01:21:23 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><48242D4A.3060802@egenix.com><ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com><482B11F8.2090200@egenix.com>
	<g0f42q$29h$1@ger.gmane.org><482B203B.3080305@egenix.com>
	<482B5BF4.1090007@gmail.com><ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com><482C0707.8020805@egenix.com>
	<482CC795.1050405@canterbury.ac.nz><ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
	<482CF57D.6010200@canterbury.ac.nz>
Message-ID: <g0j5k6$i45$1@ger.gmane.org>


"Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote in message 
news:482CF57D.6010200 at canterbury.ac.nz...
| Guido van Rossum wrote:
|
| > Really? Don't you think it's pretty obvious that b.transform("gzip")
| > compresses and b.untransform("gzip") decompresses? Or that
| > b.transform("base64") generates base64 format?
|
| Well, maybe. I think the problem is that the word
| "transform" is inherently direction-neutral, and it
| only becomes obvious that you have a direction in
| mind for it when you pair it with some invention
| such as "untransform".
|
| Maybe it's not all that bad, but it just seems
| like it should be possible to do better than picking
| a very general word like "transform" and giving
| it our own special meaning.

Would you prefer re_transform, which is English? 


From guido at python.org  Fri May 16 07:23:35 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 15 May 2008 22:23:35 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <g0j5k6$i45$1@ger.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<g0f42q$29h$1@ger.gmane.org> <482B203B.3080305@egenix.com>
	<482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
	<482CF57D.6010200@canterbury.ac.nz> <g0j5k6$i45$1@ger.gmane.org>
Message-ID: <ca471dc20805152223ve0633f7mdfd025b93999f881@mail.gmail.com>

On Thu, May 15, 2008 at 10:21 PM, Terry Reedy <tjreedy at udel.edu> wrote:
>
> "Greg Ewing" <greg.ewing at canterbury.ac.nz> wrote in message
> news:482CF57D.6010200 at canterbury.ac.nz...
> | Guido van Rossum wrote:
> |
> | > Really? Don't you think it's pretty obvious that b.transform("gzip")
> | > compresses and b.untransform("gzip") decompresses? Or that
> | > b.transform("base64") generates base64 format?
> |
> | Well, maybe. I think the problem is that the word
> | "transform" is inherently direction-neutral, and it
> | only becomes obvious that you have a direction in
> | mind for it when you pair it with some invention
> | such as "untransform".
> |
> | Maybe it's not all that bad, but it just seems
> | like it should be possible to do better than picking
> | a very general word like "transform" and giving
> | it our own special meaning.
>
> Would you prefer re_transform, which is English?

Yuck, no.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From brett at python.org  Fri May 16 07:39:30 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 15 May 2008 22:39:30 -0700
Subject: [Python-3000] Help with finishing PEP 3108
Message-ID: <bbaeab100805152239l7086411bkc96b815234ed8e54@mail.gmail.com>

I need help to finish implementing PEP 3108. While over 80 modules are
now deprecated in Python 2.6 (of which I did over 50 of), there are
still over 20 tasks left to do in relation to the PEP. My free time is
being sucked away since I have a conference paper deadline of June 1.
And I am moving May 31. And I have to fly down to California to help
my mother move on June 4. And other personal stuff (see a certain
trend in my life at the moment?).

So if you have time to help, please see issue 2775
(http://bugs.python.org/issue2775) and the dependencies list. The
issues range from doing patch reviews of work people have already
done, renaming a module, creating a new package, removing some use
from the stdlib, or even backporting some changes made to 3.0 that
were never merged into 2.6. In other words a wide variety of things.
=)  The PEP outlines the steps necessary to deprecate a module for
deletion or for a rename in a step-by-step manner so you don't need to
worry about forgetting a step.

If you can't choose what to do, the issues that will lead to a module
be deleted are the highest priority as renames can be handled by 2to3
in a later version while module deletions are harder to get pushed
through and accepted. The modules left to still remove are still there
because they are still used somehow in the stdlib. The module renames
are mostly done at this point, but the new packages have not been
handled yet.

Obviously I don't want the beta to be held up by this, nor do I want
to see any of the work left out because I couldn't get to it all. So
any and all help is appreciated.

-Brett

From greg.ewing at canterbury.ac.nz  Fri May 16 09:31:45 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 16 May 2008 19:31:45 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <g0j5k6$i45$1@ger.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
	<482CF57D.6010200@canterbury.ac.nz> <g0j5k6$i45$1@ger.gmane.org>
Message-ID: <482D3861.8050100@canterbury.ac.nz>

Terry Reedy wrote:

> Would you prefer re_transform, which is English?

Fiddling with the name of the antonym doesn't help.
The direction of "untransform" or whatever it's
called is only as clear as the direction of
"transform".

-- 
Greg

From mark.russell at zen.co.uk  Fri May 16 12:57:37 2008
From: mark.russell at zen.co.uk (Mark Russell)
Date: Fri, 16 May 2008 11:57:37 +0100
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482D3861.8050100@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
	<482CF57D.6010200@canterbury.ac.nz> <g0j5k6$i45$1@ger.gmane.org>
	<482D3861.8050100@canterbury.ac.nz>
Message-ID: <4C64E79A-8395-44F4-9B38-8FFB4E001451@zen.co.uk>

On 16 May 2008, at 08:31, Greg Ewing wrote:
> Fiddling with the name of the antonym doesn't help.

How about adding a direction indicator?

      gzipped = plaintext.transformto("gzip")
      plaintext = gzipped.transformfrom("gzip")

Mark

From jjb5 at cornell.edu  Fri May 16 15:40:19 2008
From: jjb5 at cornell.edu (Joel Bender)
Date: Fri, 16 May 2008 09:40:19 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482D3861.8050100@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<g0f42q$29h$1@ger.gmane.org>	<482B203B.3080305@egenix.com>
	<482B5BF4.1090007@gmail.com>	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>	<482C0707.8020805@egenix.com>
	<482CC795.1050405@canterbury.ac.nz>	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>	<482CF57D.6010200@canterbury.ac.nz>
	<g0j5k6$i45$1@ger.gmane.org> <482D3861.8050100@canterbury.ac.nz>
Message-ID: <482D8EC3.1050303@cornell.edu>

> Fiddling with the name of the antonym doesn't help.
> The direction of "untransform" or whatever it's
> called is only as clear as the direction of
> "transform".

How about making the transformation parameter more descriptive?

    gzipped = plaintext.transform(plaintext_to_gzip)
    plaintext = gzipped.transform(gzip_to_plaintext)

I would rather have one function that can do lots of different 
transformations, the same name can be used for bytes and strings, the 
transformation can be subclassed, and it doesn't have to be reflexive if 
that doesn't make sense.

    somebytes.transform(ebcdic_to_plaintext)

OK, maybe that's no so common in YOUR world :-)

    pict = open('me.jpg', 'r').read()
    y = pict.transform(jpeg_to_png).transform(plaintext_to_base64)


Joel


From ncoghlan at gmail.com  Fri May 16 16:06:07 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 17 May 2008 00:06:07 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482CC90A.5010907@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>
	<482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz>
Message-ID: <482D94CF.7090107@gmail.com>

Greg Ewing wrote:
> There's nothing wrong with that, but what it doesn't
> answer is why it's not sufficient just to do things
> like
> 
> from gzip import gzip_codec
> stream2 = BytesTransform(gzip_codec, stream1)
> 
> i.e. why there has to be a special kind of namespace
> for codecs.

Selecting an encoding is the kind of thing that will often come from the 
application's environment, or user preferences or configuration options, 
rather than being hardcoded at development time. With a flat, 
string-based codec namespace, those things are trivial to look up. 
Having to mess around with __import__ just to support a "choose 
compression method" configuration option would be fairly annoying.

The case for the special namespace is much stronger for the actual 
unicode encodings, but it still has at least some force for the 
bytes->bytes and str->str transforms.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From jjb5 at cornell.edu  Fri May 16 17:58:31 2008
From: jjb5 at cornell.edu (Joel Bender)
Date: Fri, 16 May 2008 11:58:31 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482D8EC3.1050303@cornell.edu>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<g0f42q$29h$1@ger.gmane.org>	<482B203B.3080305@egenix.com>	<482B5BF4.1090007@gmail.com>	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>	<482C0707.8020805@egenix.com>	<482CC795.1050405@canterbury.ac.nz>	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>	<482CF57D.6010200@canterbury.ac.nz>	<g0j5k6$i45$1@ger.gmane.org>
	<482D3861.8050100@canterbury.ac.nz> <482D8EC3.1050303@cornell.edu>
Message-ID: <482DAF27.4080905@cornell.edu>

I wrote:

> ...and it doesn't have to be reflexive if that...

Umm, that should have said 'have an inverse', which is different than 
reflexive or symmetric.  I get a little lost on 'surjective' and 
'injective', having been taught the terms 'onto' and 'one-to-one'.  But 
I digress.

For wrapping a file-like object, I would prefer a TransformIO class that 
takes read and write transform functions, e.g.,

     f = TransformIO(open('data.txt')
             , read=ebcdic_to_plaintext
             , write=plaintext_to_ebcdic
             )

These parameters would be optional, so if 'write' was omitted then write 
attempts would fail, likewise for 'read'.  Using functools.partial could 
be used to provide common transforms:

     ISO_8859_1_Transform = functools.partial( TransformIO
                      , read=ISO_8859_1_Decode
                      , write=ISO_8859_1_Encode
                      )

Where the to/from plain text is implicit.  And no, I'm not a huge fan of 
underbars.


Joel

From stephen at xemacs.org  Sat May 17 01:17:29 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 17 May 2008 08:17:29 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482D8EC3.1050303@cornell.edu>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
	<482CF57D.6010200@canterbury.ac.nz> <g0j5k6$i45$1@ger.gmane.org>
	<482D3861.8050100@canterbury.ac.nz> <482D8EC3.1050303@cornell.edu>
Message-ID: <87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp>

Joel Bender writes:

 > > Fiddling with the name of the antonym doesn't help.
 > > The direction of "untransform" or whatever it's
 > > called is only as clear as the direction of
 > > "transform".
 > 
 > How about making the transformation parameter more descriptive?
 > 
 >     gzipped = plaintext.transform(plaintext_to_gzip)
 >     plaintext = gzipped.transform(gzip_to_plaintext)

+1

But why be verbose *and* ignore the vernacular?

    gzipped = plaintext.transform('gzip')
    plaintext = gzipped.transform('gunzip')

I think the style should be EIBTI for "private" protocols, and TOOWDTI
for transforms that wrap well-known libraries.

 > I would rather have one function that can do lots of different 
 > transformations, the same name can be used for bytes and strings,

This is a non-starter, because you don't know what the representation
of strings is.  We could be right-thinking and mandate that in the
.transform() context the string representation is considered
big-endian (and for little-endian platforms the bytes are swabbed
before applying the transformation).  But that would annoy all the
Wintel users because string.transform('zip') would produce gobbledgook
when unzipped from the command line.  And of course assuming a little-
endian representation is un-right-thinkable.<wink>

In this sense string-to-string and byte-to-byte *must* be kept
separate from "true" codecs.  I think it would be a very bad idea to
allow names to be shared for, say, byte-to-byte and string-to-byte
"gzip" for the reason given above.

Whether string-to-string and byte-to-byte need to share a namespace is
another question, but since we already need three (string->byte,
byte->string, byte->byte) that should be forced not to collide, I
don't think that there's that big a loss in requiring that
.transform('pig_latin') (string to string) be spelled differently from
.transform('pig_latin1') (byte to byte assuming ISO 8859/1 data).

Do you have use cases where byte-to-byte and string-to-string
transformations should share the same name?


From greg.ewing at canterbury.ac.nz  Sat May 17 10:26:50 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sat, 17 May 2008 20:26:50 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482D94CF.7090107@gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz>
	<482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz>
	<482D94CF.7090107@gmail.com>
Message-ID: <482E96CA.80103@canterbury.ac.nz>

Nick Coghlan wrote:

> Having to mess around with __import__ just to support a "choose 
> compression method" configuration option would be fairly annoying.

Perhaps, but even then, I'm not sure it makes sense to
lump them all into the same namespace.

If you're choosing a compression method, it makes sense
to choose 'zip', 'gzip', or 'bzip2', but less sense to
choose 'hex' or 'base64', and even less 'utf8' or 'latin1'.

Similarly there will be different appropriate sets for
video encoding, audio encoding, etc.

-- 
Greg


From mal at egenix.com  Sat May 17 11:17:45 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Sat, 17 May 2008 11:17:45 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482E96CA.80103@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz>	<482D94CF.7090107@gmail.com>
	<482E96CA.80103@canterbury.ac.nz>
Message-ID: <482EA2B9.4090801@egenix.com>

On 2008-05-17 10:26, Greg Ewing wrote:
> Nick Coghlan wrote:
> 
>> Having to mess around with __import__ just to support a "choose 
>> compression method" configuration option would be fairly annoying.
> 
> Perhaps, but even then, I'm not sure it makes sense to
> lump them all into the same namespace.

Note that only the stdlib codecs are using one flat namespace.

Other codec packages may (and should) register their own codec
search functions and can then easily use other namespaces as well.

Think of the codec registry and access as a highly specialized
module import mechanism.

It is well possible to group codecs in packages and then access
them via their package name, e.g. 'compress.gzip'. However,
in practice, just writing 'gzip' is going to have enough
expressiveness to have a programmer understand what is
happening.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 17 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Sat May 17 11:18:55 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 17 May 2008 19:18:55 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482E96CA.80103@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz>	<482D94CF.7090107@gmail.com>
	<482E96CA.80103@canterbury.ac.nz>
Message-ID: <482EA2FF.5060306@gmail.com>

Greg Ewing wrote:
> Nick Coghlan wrote:
> 
>> Having to mess around with __import__ just to support a "choose 
>> compression method" configuration option would be fairly annoying.
> 
> Perhaps, but even then, I'm not sure it makes sense to
> lump them all into the same namespace.
> 
> If you're choosing a compression method, it makes sense
> to choose 'zip', 'gzip', or 'bzip2', but less sense to
> choose 'hex' or 'base64', and even less 'utf8' or 'latin1'.
> 
> Similarly there will be different appropriate sets for
> video encoding, audio encoding, etc.

The problem with that is that defining the categories becomes a fairly 
tedious chore. Having the codec namespace is convenient, and I don't see 
anything but downsides in trying to replace it with something more 
complicated.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From stephen at xemacs.org  Sat May 17 23:57:34 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 18 May 2008 06:57:34 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482E96CA.80103@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com>
	<482E96CA.80103@canterbury.ac.nz>
Message-ID: <87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > If you're choosing a compression method, it makes sense
 > to choose 'zip', 'gzip', or 'bzip2', but less sense to
 > choose 'hex' or 'base64',

Doesn't "consenting adults" cover choosing a nonsensical compressor?
Do you really think that .transform clients will really choose
'base64' when they want 'lzma'?  If so, why isn't

    if compression_method not in ['zip', 'lzma']:
        raise PEBKAC_Error

sufficient protection?

 > and even less 'utf8' or 'latin1'.

These will fail the typing tests, since they are string->bytes, not
bytes->bytes.  These tests will be necessary, which could be
considered an argument against the flat namespace.


From greg.ewing at canterbury.ac.nz  Sun May 18 01:50:24 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 18 May 2008 11:50:24 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz>
	<482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz>
	<482D94CF.7090107@gmail.com> <482E96CA.80103@canterbury.ac.nz>
	<87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <482F6F40.4030902@canterbury.ac.nz>

Stephen J. Turnbull wrote:

> Do you really think that .transform clients will really choose
> 'base64' when they want 'lzma'?

It depends on who the "client" is. An application popping
up a list of compression methods is just going to confuse
users if it lists "base64" as a possibility.

So it already needs some application-specific notion of
what constitutes a probable compression method built
into it, and if that list is to be extensible, it needs
an application-specific registry to manage it. Once
you've got that, the general codec registry doesn't
help you much.

-- 
Greg

From stephen at xemacs.org  Sun May 18 05:05:59 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 18 May 2008 12:05:59 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482F6F40.4030902@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com>
	<482E96CA.80103@canterbury.ac.nz>
	<87od74fvap.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482F6F40.4030902@canterbury.ac.nz>
Message-ID: <87lk28fh0o.fsf@uwakimon.sk.tsukuba.ac.jp>

Greg Ewing writes:

 > So it already needs some application-specific notion of
 > what constitutes a probable compression method built
 > into it, and if that list is to be extensible, it needs
 > an application-specific registry to manage it. Once
 > you've got that, the general codec registry doesn't
 > help you much.

Excuse me?  The codec-and-transform registry tells whether the codec
or transform is available in this Python; that's all it is supposed to
do.  Even if you do need an application-specific registry of
compressors, some Python-level registry is required to determine
whether a desired one is actually available and where it lives.

True, this could be done through the usual module mechanisms, but that
won't require any less coding than using the usual codec mechanism.
And I find Nick's rationale for a flat namespace of strings quite
convincing given that it won't cost any more.

I also suspect that it may make sense to allow various "standard
deobfuscations" of codec names as in glibc (whose version of iconv
considers "utf8", "UTF-8", and "Utf_8" to be equivalent names for
"Unicode UTF-8" according to rules which canonicalize case and strip
punctuation), as well as aliasing.  (These aren't strong reasons for
using a flat string registry, but they come more or less for free if
we do use it.)


From martin at v.loewis.de  Sun May 18 09:30:38 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 18 May 2008 09:30:38 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482D94CF.7090107@gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com>
Message-ID: <482FDB1E.3010303@v.loewis.de>

> Selecting an encoding is the kind of thing that will often come from the
> application's environment, or user preferences or configuration options,
> rather than being hardcoded at development time.

And that's the main difference why having encode/decode is a good idea,
and having transform/untransform is a bad idea.

Encoding names are in configuration data all the time, or even in the
actual data (e.g. in MIME); they rarely are in configuration.

You typically *don't* read the name of transformations from a
configuration file. And even if they are in configuration, you
typically have a fixed set of options, rather than an extensible
one.

> With a flat,
> string-based codec namespace, those things are trivial to look up.
> Having to mess around with __import__ just to support a "choose
> compression method" configuration option would be fairly annoying.

I wouldn't mess with import:

import gzip, bz2
compressors = {"gzip":gzip.StreamCompressor,
               "bzip2":bz2.BZ2Compressor}
decompressors={"gzip":gzip.StreamDecompressor,
               "bzip2":bz2.BZ2Decompressor}

It's not that people invent new compression methods every day.

OTOH, these things have often more complex parameters than just
a name; e.g. the compressors also take a compression level. In
these cases, using

  output_to = compressors[name](compresslevel=complevel)

could work fine (as both might happen to support the compresslevel
keyword argument).

> The case for the special namespace is much stronger for the actual
> unicode encodings, but it still has at least some force for the
> bytes->bytes and str->str transforms.

Not to me, no.

Regards,
Martin

From regebro at gmail.com  Sun May 18 16:38:01 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Sun, 18 May 2008 16:38:01 +0200
Subject: [Python-3000] Python incompatibility test project.
Message-ID: <319e029f0805180738g633ccef0ke2ebf0bbec200b1c@mail.gmail.com>

Hi all!

I have created a project to make tests for all incompatibilities
between Python 2.5, 2.6 and 3.0. It's hosted on Google code:

  http://code.google.com/p/python-incompatibility/

It currently contains what I believe to be complete tests of language
incompatibilities. It also contains example code of how to avoid the
incompatibility if possible and hence write code running under both
2.6 and 3.0.

Files called test_something25.py runs under Python 2.5, and 2.6 but
should fail under Python 3.0.
Files called test_something30.py runs under Python 3.0, but should
fail under Python 2.5.
Files called test_something26.py runs under Python 2.6 and Python 3.0.

It also contains a test runner, runtest.py, and another testrunner
that prints out the test in a nice grid, called makereport.py. Both
these run under python2.4 to 3.0. makereport.py requires you to have
python2.5, python2.6 and python3.0 installed in the path.

There is as of today no tests of the standard library changes, but I
would like to have it. Help with this is appreciated, ask and ye shall
receive commit rights. :) I could also have missed some language
incompatibility.

The report output as of just now is:

                                       Python 2.5 code  Python 2.6
code  Python 3.0 code
             Group               Test   2.5  2.6  3.0    2.5  2.6  3.0
   2.5  2.6  3.0
   classic_classes                MRO    Y    Y    N      Y    Y    Y
    N    N    Y
                           class_type    Y    Y    N      Y    Y    Y
    -    -    -
              dict  dynamic_key_views    -    -    -      -    -    -
    N    N    Y
                             iterator    Y    Y    N      Y    Y    Y
    N    N    Y
                              slicing    Y    Y    N      Y    Y    Y
    -    -    -
                              sorting    Y    Y    N      Y    Y    Y
    -    -    -
          division           division    Y    Y    N      Y    Y    Y
    N    N    Y
  exception_syntax   exception_syntax    Y    Y    N      N    Y    Y
    N    Y    Y
            filter             filter    Y    Y    N      Y    Y    Y
    -    -    -
              long               long    Y    Y    N      Y    Y    Y
    N    N    Y
               map                map    Y    Y    N      Y    Y    Y
    -    -    -
             print         print_file    Y    Y    N      N    Y    Y
    N    N    Y
                         print_stdout    Y    Y    N      N    Y    Y
    N    N    Y
             range              range    Y    Y    N      Y    Y    Y
    -    -    -
            reduce             reduce    Y    Y    N      N    Y    Y
    N    Y    Y
              sort               sort    Y    Y    N      Y    Y    Y
    -    -    -
                               sorted    Y    Y    N      Y    Y    Y
    -    -    -
 string_exceptions  string_exceptions    Y    N    N      -    -    -
    -    -    -
           unicode            unicode    Y    Y    N      N    Y    Y
    N    N    Y
            xrange             xrange    Y    Y    N      Y    Y    Y
    N    N    Y

Note the following:
 - All 2.6 tests run under both 2.6 and 3.0. Python3 is not so
incompatible as rumour has it. :-)
 - There are less tests for 3.0 than for 2.5. Much of the
incompatibility for 3.0 is that you can't do some bad programming that
you could in 2.x. For example you can't do "adict.keys()[5]" in 3.0.
But why on earth would you misuse dicts like that? :-) Python 3 will
force you to write good code in some cases where you in 2.5 can write
bad code. :-) So the better your code, the easier to port to Python 3.
;.)

Feedback and help is greatly appreciated! No Python 3 experience
necessary, this is a fun way to get to know Python 3!

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From regebro at gmail.com  Sun May 18 17:03:16 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Sun, 18 May 2008 17:03:16 +0200
Subject: [Python-3000] Python incompatibility test project.
In-Reply-To: <319e029f0805180738g633ccef0ke2ebf0bbec200b1c@mail.gmail.com>
References: <319e029f0805180738g633ccef0ke2ebf0bbec200b1c@mail.gmail.com>
Message-ID: <319e029f0805180803j69503e55jdedfd7b092089e4@mail.gmail.com>

On Sun, May 18, 2008 at 4:38 PM, Lennart Regebro <regebro at gmail.com> wrote:
> It currently contains what I believe to be complete tests of language
> incompatibilities.

Although I just relialized that there is a bunch of builtin methods
that are gone which I don't have tests for. Ah well.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From paul.bedaride at gmail.com  Sun May 18 23:33:34 2008
From: paul.bedaride at gmail.com (paul bedaride)
Date: Sun, 18 May 2008 23:33:34 +0200
Subject: [Python-3000] Metaclass Vs Class Decorator
Message-ID: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>

I see the peps 3115 and 3129 about metaclass and class decorators.

I think that the pep 3129 need to be improved for show the way to declare
the decorator and not just the way to appy them.

I also wonder if we need this two things, and if that is not two way to
explain
the same semantic.

It's why a want to know how to express the class decorator for making a
comparison

paul bedaride
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080518/f6f9e87d/attachment.htm>

From g.brandl at gmx.net  Sun May 18 23:35:04 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 18 May 2008 23:35:04 +0200
Subject: [Python-3000] Metaclass Vs Class Decorator
In-Reply-To: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
References: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
Message-ID: <g0q7f2$f90$1@ger.gmane.org>

paul bedaride schrieb:
> I see the peps 3115 and 3129 about metaclass and class decorators.
> 
> I think that the pep 3129 need to be improved for show the way to declare
> the decorator and not just the way to appy them.
> 
> I also wonder if we need this two things, and if that is not two way to 
> explain
> the same semantic.
> 
> It's why a want to know how to express the class decorator for making a 
> comparison

A class decorator works exactly like a function decorator, that is,

@foo
class X: ...

is equivalent to

class X: ...
X = foo(X)

This should be all you need to know in order to write a class decorator.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From python at rcn.com  Mon May 19 05:36:54 2008
From: python at rcn.com (Raymond Hettinger)
Date: Sun, 18 May 2008 20:36:54 -0700
Subject: [Python-3000] Metaclass Vs Class Decorator
References: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
	<g0q7f2$f90$1@ger.gmane.org>
Message-ID: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1>

>> It's why a want to know how to express the class decorator for making a 
>> comparison

[Georg]
> A class decorator works exactly like a function decorator, that is,
> 
> @foo
> class X: ...
> 
> is equivalent to
> 
> class X: ...
> X = foo(X)
> 
> This should be all you need to know in order to write a class decorator.

I concur.


Raymond

From ncoghlan at gmail.com  Mon May 19 07:06:55 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Mon, 19 May 2008 15:06:55 +1000
Subject: [Python-3000] Metaclass Vs Class Decorator
In-Reply-To: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
References: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
Message-ID: <48310AEF.8010408@gmail.com>

paul bedaride wrote:
> I also wonder if we need this two things, and if that is not two way to 
> explain
> the same semantic.

Changing the metaclass can lead to some fundamental changes to the way a 
class operates. Class decorators are for simpler things which don't 
require major changes to the class, and, in particular, things which 
shouldn't automatically be inherited by subclasses.

The specific motivating example in the python-dev thread linked from PEP 
3129 was a class registry where being a subclass of an already 
registered class didn't necessary imply that the subclass should also be 
registered. This semantic is painful to implement using a metaclass, but 
trivial with a class decorator.

"Should subclasses implicitly inherit this behaviour" is actually a 
pretty decent rule of thumb for deciding whether something should be 
handled with a metaclass or a class decorator.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From guido at python.org  Mon May 19 17:14:20 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 19 May 2008 08:14:20 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482FDB1E.3010303@v.loewis.de>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com>
	<482FDB1E.3010303@v.loewis.de>
Message-ID: <ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>

On Sun, May 18, 2008 at 12:30 AM, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Selecting an encoding is the kind of thing that will often come from the
>> application's environment, or user preferences or configuration options,
>> rather than being hardcoded at development time.
>
> And that's the main difference why having encode/decode is a good idea,
> and having transform/untransform is a bad idea.
>
> Encoding names are in configuration data all the time, or even in the
> actual data (e.g. in MIME); they rarely are in configuration.
>
> You typically *don't* read the name of transformations from a
> configuration file. And even if they are in configuration, you
> typically have a fixed set of options, rather than an extensible
> one.
>
>> With a flat,
>> string-based codec namespace, those things are trivial to look up.
>> Having to mess around with __import__ just to support a "choose
>> compression method" configuration option would be fairly annoying.
>
> I wouldn't mess with import:
>
> import gzip, bz2
> compressors = {"gzip":gzip.StreamCompressor,
>               "bzip2":bz2.BZ2Compressor}
> decompressors={"gzip":gzip.StreamDecompressor,
>               "bzip2":bz2.BZ2Decompressor}
>
> It's not that people invent new compression methods every day.
>
> OTOH, these things have often more complex parameters than just
> a name; e.g. the compressors also take a compression level. In
> these cases, using
>
>  output_to = compressors[name](compresslevel=complevel)
>
> could work fine (as both might happen to support the compresslevel
> keyword argument).
>
>> The case for the special namespace is much stronger for the actual
>> unicode encodings, but it still has at least some force for the
>> bytes->bytes and str->str transforms.
>
> Not to me, no.

Hm, Martin is pretty convincing here. Before we go ahead and accept
.transform() and friends (by whatever name) we should look for
convincing use cases where the transformation is typically given by
some other input, rather than hard-coded in the app. (And cases where
there are two or three possibilities from a fixed menu don't count --
so that would rule out Content-transfer-encoding.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jjb5 at cornell.edu  Mon May 19 17:53:11 2008
From: jjb5 at cornell.edu (Joel Bender)
Date: Mon, 19 May 2008 11:53:11 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<g0f42q$29h$1@ger.gmane.org>	<482B203B.3080305@egenix.com>	<482B5BF4.1090007@gmail.com>	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>	<482C0707.8020805@egenix.com>	<482CC795.1050405@canterbury.ac.nz>	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>	<482CF57D.6010200@canterbury.ac.nz>	<g0j5k6$i45$1@ger.gmane.org>	<482D3861.8050100@canterbury.ac.nz>	<482D8EC3.1050303@cornell.edu>
	<87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4831A267.5000304@cornell.edu>

Stephen J. Turnbull wrote:

> But why be verbose *and* ignore the vernacular?
> 
>     gzipped = plaintext.transform('gzip')
>     plaintext = gzipped.transform('gunzip')

I'm generally resistant to a registry, none of my applications are so 
general that they would take advantage of a 
string-key-to-dictionary-to-function-pointer.  If they did, they would 
have to have some pretty severe constraints on what functions can be 
selected, so I would end up building my own context sensitive dictionary 
of available functions.   I'm in favor of:

     gzipped = plaintext.transform(zlib.compress)
     plaintext = gzipped.transform(zlib.decompress)

So, you may ask, why would that be any better that this...

     gzipped = zlib.compress(plaintext)

...and the answer is that it depends on what you consider the most 
appropriate design pattern to follow.

> I think the style should be EIBTI for "private" protocols, and TOOWDTI
> for transforms that wrap well-known libraries.

I've been around socket libraries and protocol encoding/decoding stacks 
too long I guess, or I'm just jaded, but TOOWDTI is a pipe dream. 
There's Only One Blessed Way To Do It I can understand and appreciate.

EIBTI trumps TOOWDTI when it has to go through a registry.  I would be 
-1 on this design:

     In module codecs:

         from gzip import compress as _gzip_compress
         ...
         _registry['gzip'] = _gzip_compress

Where there is a great deal of code that enforces TOOWDTI, effectively 
obfuscating the fact that all your passing to transform() nothing more 
magical than a reference to a function.

> This is a non-starter, because you don't know what the representation
> of strings is.

If you're working on that kind of application.  My applications have to 
know what the items in the sequence are, or they have to figure it out, 
but when it comes time to do the transformation, they know.

> We could be right-thinking and mandate that in the
> .transform() context the string representation is considered
> big-endian (and for little-endian platforms the bytes are swabbed
> before applying the transformation).

Yuck.

> But that would annoy all the Wintel users because string.transform('zip')
> would produce gobbledgook when unzipped from the command line.  And
> of course assuming a little-endian representation is un-right-thinkable.

It would annoy me because mandating the format of the input is up to the 
transformation function, not the transform().

     y = x.transform(f)

If there is some endian restriction on f, it should detect it and 
enforce it, or if it can't, document it.  If there is some platform 
strangeness, it should take that into account.

> In this sense string-to-string and byte-to-byte *must* be kept
> separate from "true" codecs.

I don't any codecs that aren't true.  Some may be more popular or 
command than others, and the more popular ones may be blessed by being 
presented as easily accessible, just like your gunzip === gzip_to_plaintext.

> I think it would be a very bad idea to allow names to be shared
> for, say, byte-to-byte and string-to-byte "gzip" for the reason
> given above.

I don't agree, only because I've written plenty of functions that can 
take a variety of different kinds of inputs as a convenience.  If 
zlib.compress can take bytes or strings I would be fine with that, and 
if I could be more explicit, e.g.,

     gzipped = plainbytes.transform(zlib.compress_bytes)

I would be even happier.   What is not available in Python that is in 
C++, and believe that I don't miss it all THAT much, is a way to select 
the appropriate function based on both the input and output. 
Annotations would have been a way to do it, but there's far too many 
people that don't like it for very good reasons.

> Whether string-to-string and byte-to-byte need to share a namespace is
> another question, but since we already need three (string->byte,
> byte->string, byte->byte) that should be forced not to collide, I
> don't think that there's that big a loss in requiring that
> .transform('pig_latin') (string to string) be spelled differently from
> .transform('pig_latin1') (byte to byte assuming ISO 8859/1 data).

I agree, and I don't think there's an advantage to passing string names.

     import piglatin as pig
     piggy = mytext.transform(pig.latin1_encode)

I'm -1 on transform.register('pig_latin1', pig.latin1_encode).

> Do you have use cases where byte-to-byte and string-to-string
> transformations should share the same name?

Not in the same module.


Joel


From guido at python.org  Mon May 19 18:10:19 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 19 May 2008 09:10:19 -0700
Subject: [Python-3000] Metaclass Vs Class Decorator
In-Reply-To: <00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1>
References: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
	<g0q7f2$f90$1@ger.gmane.org>
	<00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1>
Message-ID: <ca471dc20805190910x48fd34ccpbf14e6b7cd8d1477@mail.gmail.com>

On Sun, May 18, 2008 at 8:36 PM, Raymond Hettinger <python at rcn.com> wrote:
>>> It's why a want to know how to express the class decorator for making a
>>> comparison
>
> [Georg]
>>
>> A class decorator works exactly like a function decorator, that is,
>>
>> @foo
>> class X: ...
>>
>> is equivalent to
>>
>> class X: ...
>> X = foo(X)
>>
>> This should be all you need to know in order to write a class decorator.
>
> I concur.

Technically, that's true, but an example wouldn't hurt. Examples also
help understanding the motivation. Even the difference between class
decorators and metaclasses could be explained with examples. (E.g. a
metaclass that auto-registers its classes vs. a class decorator that
registers a class.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mal at egenix.com  Mon May 19 19:03:53 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 19 May 2008 19:03:53 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>
	<482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz>
	<482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de>
	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
Message-ID: <4831B2F9.8040001@egenix.com>

On 2008-05-19 17:14, Guido van Rossum wrote:
> Hm, Martin is pretty convincing here. Before we go ahead and accept
> .transform() and friends (by whatever name) we should look for
> convincing use cases where the transformation is typically given by
> some other input, rather than hard-coded in the app. (And cases where
> there are two or three possibilities from a fixed menu don't count --
> so that would rule out Content-transfer-encoding.)

The .transform() methods are meant as interface to same type
codecs in general, not just compression algorithms.

They are convenience methods to the codecs registry
with the added benefit of applying type checks which the codecs
registry does not guarantee since it only manages codecs.

Of course, you can write everything directly against the codec
registry or some other specialized interface, but that's not
really what we're after here.

The methods are meant to make code easy to write in the
general use case, without having to worry about special
parameters or finding the right module and function names.

Motivation: When was the last time you used a gzip compression
option (ie. yes there are options, but do you use them in the
general use case) ? Can you write code that applies UU encoding
without looking up the details in the documentation (ie. there
is a module for doing UU-encoding in the stdlib, but what's it's
name, what's the function, does it need extra logic) ?

The motivation is not driven by having the need to pass a
configuration parameter to a .transform() method.

It's being able to write

     str.transform('gzip').transform('uu')

which doesn't require knowledge about the modules doing the actual
work behind the scenes.

We're not adding those methods because there's no other way
to get the functionality. It's all about usability, readability
and PEP20 ("Beautiful is better than ugly.").

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 19 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From python at rcn.com  Mon May 19 19:19:06 2008
From: python at rcn.com (Raymond Hettinger)
Date: Mon, 19 May 2008 10:19:06 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de><ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
	<4831B2F9.8040001@egenix.com>
Message-ID: <006b01c8b9d4$730657e0$ac00a8c0@RaymondLaptop1>

[MAL] 
> It's being able to write
> 
>     str.transform('gzip').transform('uu')
> 
> which doesn't require knowledge about the modules doing the actual
> work behind the scenes.

What is the reverse operation for the above example:  str.untransform('uu').untransform('gzip')?

Why can't we use codecs and stick with the usual encode/decode methods?


Raymond

From jjb5 at cornell.edu  Mon May 19 19:21:38 2008
From: jjb5 at cornell.edu (Joel Bender)
Date: Mon, 19 May 2008 13:21:38 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <482FDB1E.3010303@v.loewis.de>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz>
	<482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de>
Message-ID: <4831B722.3070707@cornell.edu>

Martin v. L?wis wrote:

> And that's the main difference why having encode/decode is a good idea,
> and having transform/untransform is a bad idea.

I agree that 'untransform' is a bad name for the inverse of transform, 
but I don't think 'transform' is bad.  For me the distinction is 
existence of a 'model'.

     sequence -> model -> sequence

...is different than...

     sequence -> sequence

where 'sequence' is a string, bytes or stream.  In transformations there 
is no intermediate model.

> OTOH, these things have often more complex parameters than just
> a name; e.g. the compressors also take a compression level. In
> these cases, using
> 
>   output_to = compressors[name](compresslevel=complevel)
> 
> could work fine (as both might happen to support the compresslevel
> keyword argument).

Your example seems to indicate a model->sequence operation, that I would 
call 'encode'.  Now the question becomes, given 'f', what makes more sense:

     (a)  y = x.transform(f)
     (b)  y = x.encode(f)
     (c)  y = f(x)

What do you expect the function signature of 'output_to' to be?  Is it 
callable?  Is it something that is going to be a stream wrapper, that 
has .read() and .write()?  Is it an intermediary, something that can be 
built as an object and bound between two streams bidirectionally?

     f().transform(x, y)

Another case, which would suffer from as much if not more API confusion, 
would be encrypting and decrypting...

     from Crypto.Cipher import DES

     obj = DES.new('abcdefgh', DES.ECB)
     plain = "Guido van Rossum is a space alien.XXXXXX"

In this case using .transform() would seem to be a good fit because 
there is no model, but 'obj' suffers from being directionless, so it 
becomes this...

     ciph = plain.transform(obj.encrypt)

...which isn't substantially clearer than...

     ciph = obj.encrypt(plain)

Parametric transformations don't bother me, but that would be an 
indication that there's a lot more going on, and perhaps there are 
better (and pre-existing) labels for these functions.


Joel

From paul.bedaride at gmail.com  Mon May 19 19:34:17 2008
From: paul.bedaride at gmail.com (paul bedaride)
Date: Mon, 19 May 2008 19:34:17 +0200
Subject: [Python-3000] Metaclass Vs Class Decorator
In-Reply-To: <ca471dc20805190910x48fd34ccpbf14e6b7cd8d1477@mail.gmail.com>
References: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
	<g0q7f2$f90$1@ger.gmane.org>
	<00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1>
	<ca471dc20805190910x48fd34ccpbf14e6b7cd8d1477@mail.gmail.com>
Message-ID: <fa7d4c4f0805191034l239ffc7al65d54a44355ba773@mail.gmail.com>

I think about it, and I think that it's two differents way of applying a
similar thing,

it's why I wonder, if this can't be good if metaclass and class decorator
have the same
interface, then we can use a class as a metaclass or as a decorator ??

paul bedaride

On Mon, May 19, 2008 at 6:10 PM, Guido van Rossum <guido at python.org> wrote:

> On Sun, May 18, 2008 at 8:36 PM, Raymond Hettinger <python at rcn.com> wrote:
> >>> It's why a want to know how to express the class decorator for making a
> >>> comparison
> >
> > [Georg]
> >>
> >> A class decorator works exactly like a function decorator, that is,
> >>
> >> @foo
> >> class X: ...
> >>
> >> is equivalent to
> >>
> >> class X: ...
> >> X = foo(X)
> >>
> >> This should be all you need to know in order to write a class decorator.
> >
> > I concur.
>
> Technically, that's true, but an example wouldn't hurt. Examples also
> help understanding the motivation. Even the difference between class
> decorators and metaclasses could be explained with examples. (E.g. a
> metaclass that auto-registers its classes vs. a class decorator that
> registers a class.)
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/<http://www.python.org/%7Eguido/>
> )
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/paul.bedaride%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080519/d2fefc80/attachment-0001.htm>

From guido at python.org  Mon May 19 19:36:34 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 19 May 2008 10:36:34 -0700
Subject: [Python-3000] Metaclass Vs Class Decorator
In-Reply-To: <fa7d4c4f0805191034l239ffc7al65d54a44355ba773@mail.gmail.com>
References: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
	<g0q7f2$f90$1@ger.gmane.org>
	<00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1>
	<ca471dc20805190910x48fd34ccpbf14e6b7cd8d1477@mail.gmail.com>
	<fa7d4c4f0805191034l239ffc7al65d54a44355ba773@mail.gmail.com>
Message-ID: <ca471dc20805191036h3bd714c2x2f358a9c37677749@mail.gmail.com>

You ought to ask this on c.l.py. The designers of the feature were
well aware of the similarities, and also of the differences, and the
decision was made to have both. Explaining this to every person who
asks is not a good use of our time.

On Mon, May 19, 2008 at 10:34 AM, paul bedaride <paul.bedaride at gmail.com> wrote:
> I think about it, and I think that it's two differents way of applying a
> similar thing,
>
> it's why I wonder, if this can't be good if metaclass and class decorator
> have the same
> interface, then we can use a class as a metaclass or as a decorator ??
>
> paul bedaride
>
> On Mon, May 19, 2008 at 6:10 PM, Guido van Rossum <guido at python.org> wrote:
>>
>> On Sun, May 18, 2008 at 8:36 PM, Raymond Hettinger <python at rcn.com> wrote:
>> >>> It's why a want to know how to express the class decorator for making
>> >>> a
>> >>> comparison
>> >
>> > [Georg]
>> >>
>> >> A class decorator works exactly like a function decorator, that is,
>> >>
>> >> @foo
>> >> class X: ...
>> >>
>> >> is equivalent to
>> >>
>> >> class X: ...
>> >> X = foo(X)
>> >>
>> >> This should be all you need to know in order to write a class
>> >> decorator.
>> >
>> > I concur.
>>
>> Technically, that's true, but an example wouldn't hurt. Examples also
>> help understanding the motivation. Even the difference between class
>> decorators and metaclasses could be explained with examples. (E.g. a
>> metaclass that auto-registers its classes vs. a class decorator that
>> registers a class.)
>>
>> --
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-3000/paul.bedaride%40gmail.com
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mal at egenix.com  Mon May 19 19:36:44 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 19 May 2008 19:36:44 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <006b01c8b9d4$730657e0$ac00a8c0@RaymondLaptop1>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de><ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>	<4831B2F9.8040001@egenix.com>
	<006b01c8b9d4$730657e0$ac00a8c0@RaymondLaptop1>
Message-ID: <4831BAAC.9050403@egenix.com>

On 2008-05-19 19:19, Raymond Hettinger wrote:
> [MAL]
>> It's being able to write
>>
>>     str.transform('gzip').transform('uu')
 >>
>> which doesn't require knowledge about the modules doing the actual
>> work behind the scenes.
> 
> What is the reverse operation for the above example:  
> str.untransform('uu').untransform('gzip')?

Yes.

BTW: Since the codecs do bytes->bytes conversion, I should have
written bytes.transform('gzip').transform('uu')

> Why can't we use codecs and stick with the usual encode/decode methods?

That's what you can do in Python 2.x.

In Py 3.x, .encode() and .decode() have strict type requirements
on their return types. .transform() and .untransform() return the
same type, .encode() and .decode() return bytes and str resp.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 19 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From tjreedy at udel.edu  Mon May 19 20:12:54 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Mon, 19 May 2008 14:12:54 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de><ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
	<4831B2F9.8040001@egenix.com>
Message-ID: <g0sfv4$30r$1@ger.gmane.org>


"M.-A. Lemburg" <mal at egenix.com> wrote in message 
news:4831B2F9.8040001 at egenix.com...
| Motivation: When was the last time you used a gzip compression
| option (ie. yes there are options, but do you use them in the
| general use case) ? Can you write code that applies UU encoding
| without looking up the details in the documentation (ie. there
| is a module for doing UU-encoding in the stdlib, but what's it's
| name, what's the function, does it need extra logic) ?

This suggests to me the possibility of two more packages for the 
reorganized stdlib: b2b and s2s.  Or of considating most transform 
functions into one module, just as math and cmath consolidate float and 
complex transforms -- some with inverses and some not.

IOW, I think .transform may be the wrong solution to library 
disorganization.

| The motivation is not driven by having the need to pass a
| configuration parameter to a .transform() method.
|
| It's being able to write
|
|     str.transform('gzip').transform('uu')

To me, this is to
    uu(gzip(s))
as
    somefloat.transform('cos').transform('sin')
is to
    sin(cos(somefloat))

| which doesn't require knowledge about the modules doing the actual
| work behind the scenes.

It does require knowledge of the registered name.

| We're not adding those methods because there's no other way
| to get the functionality. It's all about usability, readability
| and PEP20 ("Beautiful is better than ugly.").

I think I find the direct function call more readable and prettier.

tjr


From mal at egenix.com  Mon May 19 21:22:57 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 19 May 2008 21:22:57 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <g0sfv4$30r$1@ger.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com><482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz><482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz><482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de><ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>	<4831B2F9.8040001@egenix.com>
	<g0sfv4$30r$1@ger.gmane.org>
Message-ID: <4831D391.8030008@egenix.com>

On 2008-05-19 20:12, Terry Reedy wrote:
> IOW, I think .transform may be the wrong solution to library 
> disorganization.

Those methods are not meant to help with the library reorg.

They are needed as an easy way to access codecs that perform
str->str or bytes->bytes encoding/decoding, e.g. for escaping
text ('unicode-printable', 'xml-escape').

I'm using gzip, uu or base64 as examples, since those codecs
already exist in Python 2.x and currently cannot be used in
Python 3.x due to the type restrictions on the .encode() and
.decode() methods.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 19 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From martin at v.loewis.de  Mon May 19 23:03:23 2008
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 19 May 2008 23:03:23 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4831B2F9.8040001@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>
	<482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz>
	<482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de>
	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
	<4831B2F9.8040001@egenix.com>
Message-ID: <4831EB1B.4080404@v.loewis.de>

> They are convenience methods to the codecs registry
> with the added benefit of applying type checks which the codecs
> registry does not guarantee since it only manages codecs.

I argue that things that could be parameters to .transform don't
belong into the codec registry in the first place.

> Of course, you can write everything directly against the codec
> registry or some other specialized interface, but that's not
> really what we're after here.

No need for writing directly against the codec registry.

Using some other specialized interface: yes, Yes, YES!

> Motivation: When was the last time you used a gzip compression
> option (ie. yes there are options, but do you use them in the
> general use case) ?

Depends on what I do: when I invoke gzip from the command line,
I pass -9 all the time, as a habit.

Or did you mean "in Python"? It's a long time that I needed to
use the gzip module at all; and the last few times, I suppose
it was always through the tarfile module.

I use gzip so rarely that I find it wasteful that it gets its
own shortcut. If I had a (half-serious) wish for a string
method shortcut, it would be

"GET / HTTP/1.0\r\n\r\n".sendto("foo.bar.com", 80)

Perhaps I should write a codec for that:

"GET / HTTP/1.0\r\n\r\n".encode("http:foo.bar.com")

which sends the request and returns the response :-)

> Can you write code that applies UU encoding
> without looking up the details in the documentation (ie. there
> is a module for doing UU-encoding in the stdlib, but what's it's
> name, what's the function, does it need extra logic) ?

You mean, without looking into the HTML documentation? Sure enough.
"import uu" I remember, then I do help(uu), scroll to the end.

If you can't remember that the module's name is uu, then you probably
can't remember the codec, either:

py> "foo".encode("uuencode")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: uuencode

Regards,
Martin


From martin at v.loewis.de  Mon May 19 23:15:45 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 19 May 2008 23:15:45 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4831B722.3070707@cornell.edu>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz>	<482D94CF.7090107@gmail.com>
	<482FDB1E.3010303@v.loewis.de> <4831B722.3070707@cornell.edu>
Message-ID: <4831EE01.7020704@v.loewis.de>

>>   output_to = compressors[name](compresslevel=complevel)
>>
> Your example seems to indicate a model->sequence operation, that I would
> call 'encode'.  Now the question becomes, given 'f', what makes more sense:
> 
>     (a)  y = x.transform(f)
>     (b)  y = x.encode(f)
>     (c)  y = f(x)
> 
> What do you expect the function signature of 'output_to' to be?

People brought that up in the context of stacking streams. So output_to
would have a stream interface, so you would say

   (d) output_to.write(x)

(and yes, I do recognize that the ultimate receiver of the output,
e.g. the socket or such, is missing in my API)

> Is it
> callable?  Is it something that is going to be a stream wrapper, that
> has .read() and .write()?

That's what I meant it to be.

I'm not quite sure why you are asking these questions.

> In this case using .transform() would seem to be a good fit because
> there is no model, but 'obj' suffers from being directionless, so it
> becomes this...
> 
>     ciph = plain.transform(obj.encrypt)
> 
> ...which isn't substantially clearer than...
> 
>     ciph = obj.encrypt(plain)

It isn't substantially clearer, and *therefore* it is a good fit???

> Parametric transformations don't bother me, but that would be an
> indication that there's a lot more going on, and perhaps there are
> better (and pre-existing) labels for these functions.

If you are saying that we should call it .encrypt, not .transform:
I completely agree.

Regards,
Martin

From stephen at xemacs.org  Tue May 20 00:27:46 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 20 May 2008 07:27:46 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com>
	<482FDB1E.3010303@v.loewis.de>
	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
Message-ID: <87zlqmdj4t.fsf@uwakimon.sk.tsukuba.ac.jp>

Guido van Rossum writes:

 > Hm, Martin is pretty convincing here. Before we go ahead and accept
 > .transform() and friends (by whatever name) we should look for
 > convincing use cases where the transformation is typically given by
 > some other input, rather than hard-coded in the app. (And cases where
 > there are two or three possibilities from a fixed menu don't count --
 > so that would rule out Content-transfer-encoding.)

I don't understand the motivation for this restriction.  I think we do
not want to share names across categories, so the size of any given
category is not important, it's the whole registry that is useful.  If
people want to filter on category, the registry entries could be given
a 'category' attribute.

Aside from that, the kind of application I have in mind is indeed
something like the email module and its clients (like Mailman).
Things like

language_charset_map = { 'japanese' : 'iso-2022-jp',
                         'english' : 'iso-8859-1',
                         'russian' : 'koi8-r',
                         ... }

charset_transfer_encoding_map = { 'iso-2022-jp' : 'base64',
                                  'iso-8859-1' : 'quoted-printable',
                                  'koi8-r' : 'base64',
                                  ... }

mime_type_compression_map = { 'text/plain' : None,
                              'img/bmp' : 'gzip',
                              ... }

with the almost obvious definition of transform_mime_body().

This kind of table is often given in a file accessed by non-Python-
programmers.  For example, for encodings that are not mostly ASCII,
gzipped base64 may be a very economical way to transmit (and store) a
text part.  However, a non-English list that transmits a lot of code
might prefer quoted-printable to allow the code to be analyzed by some
kind of robot (obviously a legacy app!), and many lists will have
strong preferences between UTF-8 and a legacy encoding.  Japanese
companies often have corporate encodings containing characters not
available in JIS (and sometimes not in Unicode).  A list dedicated to
image processing may want to add image/* formats that haven't yet been
registered with the IANA, etc.

On the Mailman lists it is a FAQ that people don't understand the
difference between 'None' and None.  I don't think we can avoid None,
True, and False, but for many Mailman admins the difference between
'gzip' and Compressors.gzip.compress is non-obvious and annoying.
Giving string names to all these transforms would make the
administration interface perceptibly more regular.

On the other hand, suppose we have a web interface for configuration
so that the admins don't ever see the difference between a codec
registry key and a Python identifier.  Do we want to expose all the
possible compressors, codecs, transfer encodings, and what not in the
module that provides the configuration UI so that the list of names
can be provided?  How does the web interface avoid needing to know all
of those in advance?  How does the web interface know which functions
are which (eg, compressor v. decompressor)?

Of course the same questions apply to a registry, but as functionality
(answers to those questions) is added to the registry, the changes
needed to take advantage of it are much more localized and less
invasive than, say, requiring "compressors" to provide "compress" and
"uncompress" functions or methods, and a standard set of options.

The main thing that I sympathize with in Martin's post is the issue of
options to transforms, but it seems to me that keyword arguments deal
with that clearly and flexibly.


From guido at python.org  Tue May 20 00:32:43 2008
From: guido at python.org (Guido van Rossum)
Date: Mon, 19 May 2008 15:32:43 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87zlqmdj4t.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz> <482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz> <482D94CF.7090107@gmail.com>
	<482FDB1E.3010303@v.loewis.de>
	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
	<87zlqmdj4t.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <ca471dc20805191532v4a361225ndb40e87b2e4891b3@mail.gmail.com>

On Mon, May 19, 2008 at 3:27 PM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Guido van Rossum writes:
>
>  > Hm, Martin is pretty convincing here. Before we go ahead and accept
>  > .transform() and friends (by whatever name) we should look for
>  > convincing use cases where the transformation is typically given by
>  > some other input, rather than hard-coded in the app. (And cases where
>  > there are two or three possibilities from a fixed menu don't count --
>  > so that would rule out Content-transfer-encoding.)
>
> I don't understand the motivation for this restriction.  I think we do
> not want to share names across categories, so the size of any given
> category is not important, it's the whole registry that is useful.  If
> people want to filter on category, the registry entries could be given
> a 'category' attribute.
>
> Aside from that, the kind of application I have in mind is indeed
> something like the email module and its clients (like Mailman).
> Things like
>
> language_charset_map = { 'japanese' : 'iso-2022-jp',
>                         'english' : 'iso-8859-1',
>                         'russian' : 'koi8-r',
>                         ... }
>
> charset_transfer_encoding_map = { 'iso-2022-jp' : 'base64',
>                                  'iso-8859-1' : 'quoted-printable',
>                                  'koi8-r' : 'base64',
>                                  ... }
>
> mime_type_compression_map = { 'text/plain' : None,
>                              'img/bmp' : 'gzip',
>                              ... }
>
> with the almost obvious definition of transform_mime_body().
>
> This kind of table is often given in a file accessed by non-Python-
> programmers.  For example, for encodings that are not mostly ASCII,
> gzipped base64 may be a very economical way to transmit (and store) a
> text part.  However, a non-English list that transmits a lot of code
> might prefer quoted-printable to allow the code to be analyzed by some
> kind of robot (obviously a legacy app!), and many lists will have
> strong preferences between UTF-8 and a legacy encoding.  Japanese
> companies often have corporate encodings containing characters not
> available in JIS (and sometimes not in Unicode).  A list dedicated to
> image processing may want to add image/* formats that haven't yet been
> registered with the IANA, etc.
>
> On the Mailman lists it is a FAQ that people don't understand the
> difference between 'None' and None.  I don't think we can avoid None,
> True, and False, but for many Mailman admins the difference between
> 'gzip' and Compressors.gzip.compress is non-obvious and annoying.
> Giving string names to all these transforms would make the
> administration interface perceptibly more regular.

There's no reason that for this pretty unusual and specific case you
couldn't have your own function that is controlled by the string value
read from the map edited by the list admin.

I think the real abomination here is to expect list admins to use
Python syntax at all.

> On the other hand, suppose we have a web interface for configuration
> so that the admins don't ever see the difference between a codec
> registry key and a Python identifier.  Do we want to expose all the
> possible compressors, codecs, transfer encodings, and what not in the
> module that provides the configuration UI so that the list of names
> can be provided?  How does the web interface avoid needing to know all
> of those in advance?  How does the web interface know which functions
> are which (eg, compressor v. decompressor)?
>
> Of course the same questions apply to a registry, but as functionality
> (answers to those questions) is added to the registry, the changes
> needed to take advantage of it are much more localized and less
> invasive than, say, requiring "compressors" to provide "compress" and
> "uncompress" functions or methods, and a standard set of options.
>
> The main thing that I sympathize with in Martin's post is the issue of
> options to transforms, but it seems to me that keyword arguments deal
> with that clearly and flexibly.
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From turnbull at sk.tsukuba.ac.jp  Tue May 20 01:07:55 2008
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Tue, 20 May 2008 08:07:55 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4831A267.5000304@cornell.edu>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <g0f42q$29h$1@ger.gmane.org>
	<482B203B.3080305@egenix.com> <482B5BF4.1090007@gmail.com>
	<ca471dc20805141442s7b54ac78y4074f995694df6ef@mail.gmail.com>
	<482C0707.8020805@egenix.com> <482CC795.1050405@canterbury.ac.nz>
	<ca471dc20805151646j189418ddh14803d7982ea7821@mail.gmail.com>
	<482CF57D.6010200@canterbury.ac.nz> <g0j5k6$i45$1@ger.gmane.org>
	<482D3861.8050100@canterbury.ac.nz> <482D8EC3.1050303@cornell.edu>
	<87wsltsut2.fsf@uwakimon.sk.tsukuba.ac.jp>
	<4831A267.5000304@cornell.edu>
Message-ID: <87y765evuc.fsf@uwakimon.sk.tsukuba.ac.jp>

Joel Bender writes:

A lot, but I don't understand why.  You seem to have a completely
different pattern (and Python 2, not Python 3) in mind, but in fact as
far as I can see the only point of conflict is that if the "registry
of string names" proposal were adopted, you'd have trouble using the
method name 'transform' as you would like to.

There's nothing in the registry proposal that prevents you from
calling functions by name, or writing polymorphic transformers, etc.


From greg.ewing at canterbury.ac.nz  Tue May 20 03:59:22 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 20 May 2008 13:59:22 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4831B2F9.8040001@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<482BF0A6.70602@canterbury.ac.nz>
	<482C11B8.3010505@gmail.com> <482CC90A.5010907@canterbury.ac.nz>
	<482D94CF.7090107@gmail.com> <482FDB1E.3010303@v.loewis.de>
	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>
	<4831B2F9.8040001@egenix.com>
Message-ID: <4832307A.6040609@canterbury.ac.nz>

M.-A. Lemburg wrote:
> It's being able to write
> 
>     str.transform('gzip').transform('uu')
> 
> which doesn't require knowledge about the modules doing the actual
> work behind the scenes.

That doesn't preclude those modules exporting their
functionality in the form of codecs having the standard
codec interface.

There are two independent issues here:

1) Should the functionality be provided in the form
    of a codec? (Yes, that's fine, IMO.)

2) Should all codecs live in a central registry and
    be callable via methods on strings and bytes?
    (I'm not convinced that's the case.)

-- 
Greg

From greg.ewing at canterbury.ac.nz  Tue May 20 04:10:33 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 20 May 2008 14:10:33 +1200
Subject: [Python-3000] Metaclass Vs Class Decorator
In-Reply-To: <fa7d4c4f0805191034l239ffc7al65d54a44355ba773@mail.gmail.com>
References: <fa7d4c4f0805181433u536fdb0fm5c1416db98986a0c@mail.gmail.com>
	<g0q7f2$f90$1@ger.gmane.org>
	<00be01c8b961$95b73fc0$ac00a8c0@RaymondLaptop1>
	<ca471dc20805190910x48fd34ccpbf14e6b7cd8d1477@mail.gmail.com>
	<fa7d4c4f0805191034l239ffc7al65d54a44355ba773@mail.gmail.com>
Message-ID: <48323319.7080901@canterbury.ac.nz>

paul bedaride wrote:

> it's why I wonder, if this can't be good if metaclass and class 
> decorator have the same
> interface, then we can use a class as a metaclass or as a decorator ??

That doesn't make sense -- metaclasses and class decorators
are very different things and have very different capabilities.

There is some overlap between the things they can do, but
trying to unify them would be a mistake.

-- 
Greg

From mal at egenix.com  Tue May 20 12:06:38 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 20 May 2008 12:06:38 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4832307A.6040609@canterbury.ac.nz>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>
	<482CC90A.5010907@canterbury.ac.nz>	<482D94CF.7090107@gmail.com>
	<482FDB1E.3010303@v.loewis.de>	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>	<4831B2F9.8040001@egenix.com>
	<4832307A.6040609@canterbury.ac.nz>
Message-ID: <4832A2AE.7090700@egenix.com>

On 2008-05-20 03:59, Greg Ewing wrote:
> M.-A. Lemburg wrote:
>> It's being able to write
>>
>>     str.transform('gzip').transform('uu')
>>
>> which doesn't require knowledge about the modules doing the actual
>> work behind the scenes.
> 
> That doesn't preclude those modules exporting their
> functionality in the form of codecs having the standard
> codec interface.

Note that all codecs we currently have in Python are in fact
modules that you can import and use directly - even subclass
to provide more or altered functionality, e.g.

     from encodings import latin_1

will give you direct access to the Latin-1 codec.

You seem to be worried that the functionality is supposed
to be buried deep in some codec registry - that's not the
case.

The codec registry only takes care of finding a codec
interface given a name, nothing more.

Also note that I'm not suggesting to remove any of the
existing implementations of specialized interfaces for
e.g. compression or base64 encoding. The codecs for these
only use these interface without assimilating them :-)

> There are two independent issues here:
> 
> 1) Should the functionality be provided in the form
>    of a codec? (Yes, that's fine, IMO.)
> 
> 2) Should all codecs live in a central registry and
>    be callable via methods on strings and bytes?
>    (I'm not convinced that's the case.)

I think there's a misunderstanding here in how codecs work.

Codecs exist to provide a consistent and well-defined interface
to a wide range of encoding and decoding applications.

They are not trying to:

  * compete with specialized interfaces

  * replace specialized interfaces

  * hide specialized interfaces from the user

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 20 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Tue May 20 12:19:40 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 20 May 2008 12:19:40 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4831EB1B.4080404@v.loewis.de>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz>	<482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de>	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>	<4831B2F9.8040001@egenix.com>
	<4831EB1B.4080404@v.loewis.de>
Message-ID: <4832A5BC.60704@egenix.com>

On 2008-05-19 23:03, Martin v. L?wis wrote:
>> They are convenience methods to the codecs registry
>> with the added benefit of applying type checks which the codecs
>> registry does not guarantee since it only manages codecs.
> 
> I argue that things that could be parameters to .transform don't
> belong into the codec registry in the first place.
> 
>> Of course, you can write everything directly against the codec
>> registry or some other specialized interface, but that's not
>> really what we're after here.
> 
> No need for writing directly against the codec registry.
> 
> Using some other specialized interface: yes, Yes, YES!

So you would like to force users to write e.g.

def uu(input,errors='strict',filename='<data>',mode=0666):
     from cStringIO import StringIO
     from binascii import b2a_uu
     # using str() because of cStringIO's Unicode undesired Unicode behavior.
     infile = StringIO(str(input))
     outfile = StringIO()
     read = infile.read
     write = outfile.write

     # Encode
     write('begin %o %s\n' % (mode & 0777, filename))
     chunk = read(45)
     while chunk:
         write(b2a_uu(chunk))
         chunk = read(45)
     write(' \nend\n')

     return outfile.getvalue()

(this is adapted Py2 code taken from the uu codec)

instead of writing

output = input.transform('uu')

Fair enough, I've noted your -1.

Still, I don't think the specialized interfaces are very user-friendly.
They do serve their purpose, but common usage just doesn't really bother
with all those details.

And it doesn't end there...

You have to look up, implement and test a similar standardizing function
for all other specialized interfaces you want to use as well - more or
less reinventing the codec interface for every application you write.

Anyway, even without a .transform() method, you can still do:

import codecs
output = codecs.encode(input, 'uu')

However, you then have to do the type checking yourself.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 20 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From martin at v.loewis.de  Tue May 20 20:35:23 2008
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 20 May 2008 20:35:23 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4832A5BC.60704@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<482BF0A6.70602@canterbury.ac.nz>	<482C11B8.3010505@gmail.com>	<482CC90A.5010907@canterbury.ac.nz>	<482D94CF.7090107@gmail.com>	<482FDB1E.3010303@v.loewis.de>	<ca471dc20805190814j38595f6atf4c12232098483ad@mail.gmail.com>	<4831B2F9.8040001@egenix.com>
	<4831EB1B.4080404@v.loewis.de> <4832A5BC.60704@egenix.com>
Message-ID: <483319EB.4090904@v.loewis.de>

> So you would like to force users to write e.g.
> 
> def uu(input,errors='strict',filename='<data>',mode=0666):
>     from cStringIO import StringIO
>     from binascii import b2a_uu
>     # using str() because of cStringIO's Unicode undesired Unicode
> behavior.
>     infile = StringIO(str(input))
>     outfile = StringIO()
>     read = infile.read
>     write = outfile.write
> 
>     # Encode
>     write('begin %o %s\n' % (mode & 0777, filename))
>     chunk = read(45)
>     while chunk:
>         write(b2a_uu(chunk))
>         chunk = read(45)
>     write(' \nend\n')
> 
>     return outfile.getvalue()
> 
> (this is adapted Py2 code taken from the uu codec)

No. I would just use uu.encode instead, which already does the loop,
and everything else. So if I really wanted a string-to-string
conversion, I would do

  infile = StringIO(input)
  outfile = StringIO()
  uu.encode(infile, outfile)
  output = outfile.getvalue()

More likely, I have file-like objects already, in which case I won't
need to create StringIO objects.

Regards,
Martin

From stefan_ml at behnel.de  Thu May 22 10:05:59 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 22 May 2008 10:05:59 +0200
Subject: [Python-3000] Cython code generation for Py3 complete
Message-ID: <g139h8$b42$1@ger.gmane.org>

Hi,

just a quick announcement that I finished the port of the Cython compiler to
Py3. While you cannot currently run Cython itself in Py3, you can build the
generated C sources unchanged under Py2.3 through 3.0a5.

    http://cython.org/

There isn't a release yet (though there will hopefully be one soon), but I
would be happy if interested people could already give it some testing. So if
you have some Pyrex sources lying around and want them to run on Python 3k,
please give it a try and report any problems you find to the Cython mailing list.

You can get the compiler from the public Mercurial repository:

    http://hg.cython.org/cython-devel/

and I have put up a developer snapshot here:

http://codespeak.net/lxml/dev/Cython-0.9.6.14-3k.tar.gz

Hoping for some feedback,

Stefan


From solipsis at pitrou.net  Thu May 22 13:58:10 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 22 May 2008 11:58:10 +0000 (UTC)
Subject: [Python-3000] PEP 3138- String representation in Python 3000
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com>
Message-ID: <loom.20080522T115632-47@post.gmane.org>

M.-A. Lemburg <mal <at> egenix.com> writes:
> 
> It's all a matter of perspective. You can say you're encoding Latin-1
> to Unicode, or you can say your encoding Unicode to Latin-1.

Except that Latin-1 is an encoding while Unicode is not. So I don't see how you
can encode to Unicode. Of course you can encode to UTF-8, UTF-16, etc. - which
/are/ encodings (and, in this case, Python returns you a bytes object :-)).

Antoine.


From mal at egenix.com  Thu May 22 14:27:19 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 22 May 2008 14:27:19 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <loom.20080522T115632-47@post.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>	<482C1293.3030409@egenix.com>
	<loom.20080522T115632-47@post.gmane.org>
Message-ID: <483566A7.6050106@egenix.com>

On 2008-05-22 13:58, Antoine Pitrou wrote:
> M.-A. Lemburg <mal <at> egenix.com> writes:
>> It's all a matter of perspective. You can say you're encoding Latin-1
>> to Unicode, or you can say your encoding Unicode to Latin-1.
> 
> Except that Latin-1 is an encoding while Unicode is not. So I don't see how you
> can encode to Unicode. Of course you can encode to UTF-8, UTF-16, etc. - which
> /are/ encodings (and, in this case, Python returns you a bytes object :-)).

Well, yes and no :-)

Unicode does encode a way to describe code points. The assignments
of integers to letters, symbols, etc. (ie. a "character set")
provides the encoding, so you can call it "encoding" as well.

OTOH, Unicode is the mother of all character sets so to speak (even
though in this case, many children existed before the mother was
formed ;-), so it has a special status.

In practice the terms "encoding" and "character set" are often
used interchangeably, just as most people talk about "characters"
when referring to "code points" and/or "glyphs", or happily mix
"UTF-8", "UTF-16" and "Unicode".

The Unicode consortium usually uses the terms "UCS2" and "UCS4"
when referring to Unicode as "character set", but even there
you have an ordering which makes it an encoding.

See my talk on Unicode for some clarification:

http://www.egenix.com/library/presentations/

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 22 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From stephen at xemacs.org  Thu May 22 19:52:52 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 23 May 2008 02:52:52 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <483566A7.6050106@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482066E3.7030209@gmail.com> <482335CE.7000309@egenix.com>
	<fvvpc7$32i$1@ger.gmane.org> <48242D4A.3060802@egenix.com>
	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com>
	<loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
Message-ID: <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>

M.-A. Lemburg writes:
 > On 2008-05-22 13:58, Antoine Pitrou wrote:
 > > M.-A. Lemburg <mal <at> egenix.com> writes:
 > >> It's all a matter of perspective. You can say you're encoding Latin-1
 > >> to Unicode, or you can say your encoding Unicode to Latin-1.
 > > 
 > > Except that Latin-1 is an encoding while Unicode is not.
 > 
 > Well, yes and no :-)
 > 
 > Unicode does encode a way to describe code points.

I don't think this is a useful POV in the context of Python, where
'unicode' is a primitive type, and not implemented as an array of
(Python) integers.

From guido at python.org  Thu May 22 19:55:01 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 22 May 2008 10:55:01 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com> <loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>

Hi folks,

Is this thread reaching a conclusion yet? I am hoping I can soon
accept some variant of the following:

1. repr() returns a Unicode string containing only printable Unicode
characters, using \x\u\U escapes for characters that are not
considered printable according to some version of the Unicode standard
augmented with some Python practicality, but unaffected by platform or
locale. This can be implemented efficiently, without having to load
the whole Unicode database, at least for strings containing only a
large subset of the Unicode character set (e.g. all of UCS2, and
possibly whole ranges of UCS4).

2. If you don't want any non-ASCII printed to a file, set the file's
encoding to ASCII and the error handler to backslashescape.

But as I haven't followed the thread I may be way off.

Is Martin's proposal to allow forcing the default stdin/stdout/stderr
encodings through environment variables related? (It should allow for
setting the error handler too.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mal at egenix.com  Thu May 22 21:09:25 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 22 May 2008 21:09:25 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>
	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>
	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>	<482C1293.3030409@egenix.com>	<loom.20080522T115632-47@post.gmane.org>	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <4835C4E5.1070407@egenix.com>

On 2008-05-22 19:52, Stephen J. Turnbull wrote:
> M.-A. Lemburg writes:
>  > On 2008-05-22 13:58, Antoine Pitrou wrote:
>  > > M.-A. Lemburg <mal <at> egenix.com> writes:
>  > >> It's all a matter of perspective. You can say you're encoding Latin-1
>  > >> to Unicode, or you can say your encoding Unicode to Latin-1.
>  > > 
>  > > Except that Latin-1 is an encoding while Unicode is not.
>  > 
>  > Well, yes and no :-)
>  > 
>  > Unicode does encode a way to describe code points.
> 
> I don't think this is a useful POV in the context of Python, where
> 'unicode' is a primitive type, and not implemented as an array of
> (Python) integers.

Agreed.

I was just explaining where the whole notion of encoding
and decoding originates and how the meaning of the .encode()
and .decode() methods came to be.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 22 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Thu May 22 21:11:56 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 22 May 2008 21:11:56 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>	<482C1293.3030409@egenix.com>
	<loom.20080522T115632-47@post.gmane.org>	<483566A7.6050106@egenix.com>	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
Message-ID: <4835C57C.9010007@egenix.com>

On 2008-05-22 19:55, Guido van Rossum wrote:
> Hi folks,
> 
> Is this thread reaching a conclusion yet? I am hoping I can soon
> accept some variant of the following:
> 
> 1. repr() returns a Unicode string containing only printable Unicode
> characters, using \x\u\U escapes for characters that are not
> considered printable according to some version of the Unicode standard
> augmented with some Python practicality, but unaffected by platform or
> locale. This can be implemented efficiently, without having to load
> the whole Unicode database, at least for strings containing only a
> large subset of the Unicode character set (e.g. all of UCS2, and
> possibly whole ranges of UCS4).
> 
> 2. If you don't want any non-ASCII printed to a file, set the file's
> encoding to ASCII and the error handler to backslashescape.

Sounds like a good compromise.

Just please don't set the error handler of sys.stdout to anything but
"strict" per default.

> But as I haven't followed the thread I may be way off.
> 
> Is Martin's proposal to allow forcing the default stdin/stdout/stderr
> encodings through environment variables related? (It should allow for
> setting the error handler too.)

It's not related, but would be very helpful on its own, esp. for
the stdin part in 3.x.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 22 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From solipsis at pitrou.net  Thu May 22 21:59:09 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Thu, 22 May 2008 21:59:09 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482B11F8.2090200@egenix.com> <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com> <loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
Message-ID: <1211486349.5825.14.camel@fsol>

Le jeudi 22 mai 2008 ? 10:55 -0700, Guido van Rossum a ?crit :
> Hi folks,
> 
> Is this thread reaching a conclusion yet? I am hoping I can soon
> accept some variant of the following:
> 
> 1. repr() returns a Unicode string containing only printable Unicode
> characters, using \x\u\U escapes for characters that are not
> considered printable according to some version of the Unicode standard
> augmented with some Python practicality, but unaffected by platform or
> locale. This can be implemented efficiently, without having to load
> the whole Unicode database, at least for strings containing only a
> large subset of the Unicode character set (e.g. all of UCS2, and
> possibly whole ranges of UCS4).
> 
> 2. If you don't want any non-ASCII printed to a file, set the file's
> encoding to ASCII and the error handler to backslashescape.

Since some people still seem wary that repr() might return non-ascii
results, perhaps we could also:

3. Add a builtin function named ascii() and a formatting code "%a" that
both call repr() internally and then convert all non-ascii characters to
\uXXXX escapes.

2to3 might even replace all occurrences of repr() by ascii(), to err on
the safe side.

Regards

Antoine.


From martin at v.loewis.de  Thu May 22 22:38:47 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 22 May 2008 22:38:47 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <483566A7.6050106@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482066E3.7030209@gmail.com>	<482335CE.7000309@egenix.com>	<fvvpc7$32i$1@ger.gmane.org>	<48242D4A.3060802@egenix.com>	<ca471dc20805090906h4fa92f79p49132fcf349e74c6@mail.gmail.com>	<482B11F8.2090200@egenix.com>	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>	<482C1293.3030409@egenix.com>	<loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
Message-ID: <4835D9D7.7040809@v.loewis.de>

> The Unicode consortium usually uses the terms "UCS2" and "UCS4"
> when referring to Unicode as "character set", but even there
> you have an ordering which makes it an encoding.

The Unicode consortium uses the term "coded character set" to describe
the assignment of characters in the set to numbers, and "character
encoding scheme" to refer to an algorithm that produces a sequence of
bytes, and doesn't use the term "encoding" altogether, see

http://www.unicode.org/unicode/reports/tr17/

Regards,
Martin

From martin at v.loewis.de  Thu May 22 22:41:34 2008
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 22 May 2008 22:41:34 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<482B11F8.2090200@egenix.com>
	<482B80D5.8000202@canterbury.ac.nz>	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>	<482C1293.3030409@egenix.com>
	<loom.20080522T115632-47@post.gmane.org>	<483566A7.6050106@egenix.com>	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
Message-ID: <4835DA7E.40304@v.loewis.de>

> Is Martin's proposal to allow forcing the default stdin/stdout/stderr
> encodings through environment variables related? (It should allow for
> setting the error handler too.)

It's related only if it supports setting the error handler as well.

Would "encoding/errorhandler" sound like a useful syntax?

Regards,
Martin

From phd at phd.pp.ru  Thu May 22 22:56:36 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Fri, 23 May 2008 00:56:36 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <4835DA7E.40304@v.loewis.de>
References: <482B80D5.8000202@canterbury.ac.nz>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com>
	<loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<4835DA7E.40304@v.loewis.de>
Message-ID: <20080522205636.GA8561@phd.pp.ru>

On Thu, May 22, 2008 at 10:41:34PM +0200, "Martin v. L?wis" wrote:
> Would "encoding/errorhandler" sound like a useful syntax?

   encoding:errorhandler

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From guido at python.org  Thu May 22 23:16:31 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 22 May 2008 14:16:31 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <20080522205636.GA8561@phd.pp.ru>
References: <482B80D5.8000202@canterbury.ac.nz>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com> <loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<4835DA7E.40304@v.loewis.de> <20080522205636.GA8561@phd.pp.ru>
Message-ID: <ca471dc20805221416h7b824303i29acecc1589e4860@mail.gmail.com>

On Thu, May 22, 2008 at 1:56 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> On Thu, May 22, 2008 at 10:41:34PM +0200, "Martin v. L?wis" wrote:
>> Would "encoding/errorhandler" sound like a useful syntax?
>
>   encoding:errorhandler

Whichever character is guaranteed never to be part of an encoding
name. All things being equal I'd prefer ':' too, since that's a pretty
common separator in environment variables, and doesn't make it look
like a pathname.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Thu May 22 23:18:50 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 22 May 2008 14:18:50 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <1211486349.5825.14.camel@fsol>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<8763tgl19a.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com> <loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
Message-ID: <ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>

> Le jeudi 22 mai 2008 ? 10:55 -0700, Guido van Rossum a ?crit :
>> Is this thread reaching a conclusion yet? I am hoping I can soon
>> accept some variant of the following:
>>
>> 1. repr() returns a Unicode string containing only printable Unicode
>> characters, using \x\u\U escapes for characters that are not
>> considered printable according to some version of the Unicode standard
>> augmented with some Python practicality, but unaffected by platform or
>> locale. This can be implemented efficiently, without having to load
>> the whole Unicode database, at least for strings containing only a
>> large subset of the Unicode character set (e.g. all of UCS2, and
>> possibly whole ranges of UCS4).
>>
>> 2. If you don't want any non-ASCII printed to a file, set the file's
>> encoding to ASCII and the error handler to backslashescape.

On Thu, May 22, 2008 at 12:59 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Since some people still seem wary that repr() might return non-ascii
> results, perhaps we could also:
>
> 3. Add a builtin function named ascii() and a formatting code "%a" that
> both call repr() internally and then convert all non-ascii characters to
> \uXXXX escapes.

I'd call that a stretch goal, but it seems an easy one.

> 2to3 might even replace all occurrences of repr() by ascii(), to err on
> the safe side.

I'd be against that.

Could someone (Atsuo?) write up a new version for the PEP, adding the
conclusions reached in this thread and recapping some of the
discussion? I think this can get in before the first beta release, and
that seems doable.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ishimoto at gembook.org  Fri May 23 03:46:31 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Fri, 23 May 2008 10:46:31 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805142022w63bdb169lb2effd73ec3f6af5@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com> <loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
Message-ID: <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>

On Fri, May 23, 2008 at 6:18 AM, Guido van Rossum <guido at python.org> wrote:

>>> 2. If you don't want any non-ASCII printed to a file, set the file's
>>> encoding to ASCII and the error handler to backslashescape.
>
> On Thu, May 22, 2008 at 12:59 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>> Since some people still seem wary that repr() might return non-ascii
>> results, perhaps we could also:
>>
>> 3. Add a builtin function named ascii() and a formatting code "%a" that
>> both call repr() internally and then convert all non-ascii characters to
>> \uXXXX escapes.
>
> I'd call that a stretch goal, but it seems an easy one.

Martin may against for new builtin function. Perhaps
string.asciirepr() might better?

>
> Could someone (Atsuo?) write up a new version for the PEP, adding the
> conclusions reached in this thread and recapping some of the
> discussion? I think this can get in before the first beta release, and
> that seems doable.
>

I'll revise the PEP and the patch soon.
One point still remains is default error handler for sys.stdout. I can
live with 'strict' error handler, but I think raising exceptions for
evenry un-supported characters by default is too exacting.

From guido at python.org  Fri May 23 06:30:35 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 22 May 2008 21:30:35 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<79990c6b0805150306i19fb3b8bqfa1dd77a583145b1@mail.gmail.com>
	<482C1293.3030409@egenix.com> <loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
Message-ID: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>

On Thu, May 22, 2008 at 6:46 PM, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
> On Fri, May 23, 2008 at 6:18 AM, Guido van Rossum <guido at python.org> wrote:
>
>>>> 2. If you don't want any non-ASCII printed to a file, set the file's
>>>> encoding to ASCII and the error handler to backslashescape.
>>
>> On Thu, May 22, 2008 at 12:59 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>>> Since some people still seem wary that repr() might return non-ascii
>>> results, perhaps we could also:
>>>
>>> 3. Add a builtin function named ascii() and a formatting code "%a" that
>>> both call repr() internally and then convert all non-ascii characters to
>>> \uXXXX escapes.
>>
>> I'd call that a stretch goal, but it seems an easy one.
>
> Martin may against for new builtin function. Perhaps
> string.asciirepr() might better?

That's not a pretty name (and aren't we going to get rid of the string
module after all?). But it's a minor detail.

>> Could someone (Atsuo?) write up a new version for the PEP, adding the
>> conclusions reached in this thread and recapping some of the
>> discussion? I think this can get in before the first beta release, and
>> that seems doable.
>>
>
> I'll revise the PEP and the patch soon.

Great!

> One point still remains is default error handler for sys.stdout. I can
> live with 'strict' error handler, but I think raising exceptions for
> evenry un-supported characters by default is too exacting.

I think to avoid exceptions you should arrange for the encoding to be
capable of encoding all characters (e.g. utf8 or utf16).

IMO it's important to trust that you didn't write garbage, unless you
specifically asked for it.

It's different for stderr, there I think the most lenient error
handling should be the default.

PS> I couldn't get backslashescape to work -- is this just a proposal?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From mwm at mired.org  Tue May  6 17:36:19 2008
From: mwm at mired.org (Mike Meyer)
Date: Tue, 06 May 2008 15:36:19 -0000
Subject: [Python-3000] [Python-Dev] Reminder: last alphas next Wednesday
 07-May-2008
In-Reply-To: <20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
References: <05760A6A-4F0F-4466-A640-062AE94E0470@python.org>
	<481A35D8.60604@cheimes.de>
	<ca471dc20805011645q5d4c0f63wd9d739abcc40de31@mail.gmail.com>
	<20080502000324.25821.1160563467.divmod.xquotient.7178@joule.divmod.com>
Message-ID: <20080506112925.023901a1@mbook-fbsd>

On Fri, 02 May 2008 00:03:24 -0000 glyph at divmod.com wrote:

> On 11:45 pm, guido at python.org wrote:
> >I like this, except one issue: I really don't like the .local
> >directory. I don't see any compelling reason why this needs to be
> >~/.local/lib/ -- IMO it should just be ~/lib/. There's no need to hide
> >it from view, especially since the user is expected to manage this
> >explicitly.
> 
> I've previously given a spirited defense of ~/.local on this list ( 
> http://mail.python.org/pipermail/python-dev/2008-January/076173.html ) 
> among other places.
> 
> Briefly, "lib" is not the only directory participating in this 
> convention; you've also got the full complement of other stuff that 
> might go into an installation like /usr/local.  So, while "lib" might 
> annoy me a little, "bin etc games include lib lib32 man sbin share src" 
> is going to get ugly pretty fast, especially if this is what comes up in 
> Finder or Nautilus or Explorer every time I open a window.

You have a problem with 10 directories? Well, ok - if you have that on
top of all the clutter that you normally get, yeah, I might object
too. On the other hand, if *every* application used those 10
directories - and *only* those 10 directories - for all the files it
needed that weren't for user-created data, that would be heaven.

The fallacy you're falling into is that users never have to deal with
those dot-files (or directories). They do. One of the most common
operations when trying to diagnose a misbehaving application is
"delete the configuration files" (my favorite is that I fix gnucash
printing failures by deleting CUPS config files....), and the user has
to figure out which, if any, of those magic files need to be
deleted. If you're using Finder, you wind up turning on the preference
that says "show me those", and suddenly your nice, clean directory
explodes into ...

Well, here's my home directory, shared between a Mac and a Unix box:

mbook-fbsd% cd
mbook-fbsd% ls | wc -l
      42
mbook-fbsd% ls -d .* | wc -l
     174

It's not very clean. Because it's a Mac, it's got some directories
that the Mac felt I needed that I really have no use for. And there's
maybe a dozen files there that are scratch files from various things I
haven't cleaned up yet. Of course, the dot-files are much worse,
because I normally don't see them, so there's not incentive to clean
them up at all.

But if i could trade those 172 (can't lose . and ..) "hidden" .files
for 10 visible directories in ~, I'd do it in an instant - even if I
didn't already have bin, etc, src & lib directories there.

> Put another way - it's trivial to make ~/.local/lib show up by 
> symlinking ~/lib, but you can't make ~/lib disappear, and lots of 
> software ends up looking at ~.

Just for the record, it's equally trivial - but better - to make
".local" disappear by symlinking '.local' to '.'. But providing an
option is even cleaner, and then the fact that you can't use symlink
to hide one is moot.

As far as I'm concerned, .local is the worst possible choice for this
choice for this name. Not only does it wind up in the more cluttered
of the two name spaces, it doesn't tell me anything about the
application(s) it belongs to, so I have to worry about it pretty much
every time I'm mucking about with the config files. .python would be
much better - at least I'd know what it was for by the name.

     <mike
-- 
Mike Meyer <mwm at mired.org>		http://www.mired.org/consulting.html
Independent Network/Unix/Perforce consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

From ocean at m2.ccsnet.ne.jp  Thu May  8 04:13:58 2008
From: ocean at m2.ccsnet.ne.jp (Hirokazu Yamamoto)
Date: Thu, 08 May 2008 02:13:58 -0000
Subject: [Python-3000] [Python-Dev] Releasing alphas tonight
References: <E7F4F775-76F9-4A1F-9910-7787D645FCC3@python.org>
	<48224E9F.40407@cheimes.de>
Message-ID: <001801c8b0ac$9b902dc0$0200a8c0@whiterabc2znlh>

Hello.

> The py3k branch has a major show stopper, It's leaking references to the
> max.

Is there any chance this leak also will be fixed?
http://bugs.python.org/issue2222

Thank you.

From paul.bedaride at gmail.com  Fri May  9 22:34:18 2008
From: paul.bedaride at gmail.com (paul bedaride)
Date: Fri, 9 May 2008 22:34:18 +0200
Subject: [Python-3000] class style
Message-ID: <fa7d4c4f0805091334k12ac2ffbuf6ce86f9bcbed1aa@mail.gmail.com>

 Hello,

I'm new on this list and it's just for ask a question about class-style in
python 3000
because I don't understand for instance why in
class Example(object):
    var1 = 'example'
    var2 = property(fget=lambda self: 'example')

var1 seems to be linked to class and var2 to object.
In more it not seem possible to define class property and it could be
usefull.

I don't know if you have already discuss about class style but if you have
could you give me the log ?

thanks in advance

paul bedaride
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080509/28a7294a/attachment.htm>

From ishimoto at gembook.org  Fri May 23 09:28:03 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Fri, 23 May 2008 16:28:03 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<482C1293.3030409@egenix.com> <loom.20080522T115632-47@post.gmane.org>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
Message-ID: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>

On Fri, May 23, 2008 at 1:30 PM, Guido van Rossum <guido at python.org> wrote:

>> One point still remains is default error handler for sys.stdout. I can
>> live with 'strict' error handler, but I think raising exceptions for
>> evenry un-supported characters by default is too exacting.
>
> I think to avoid exceptions you should arrange for the encoding to be
> capable of encoding all characters (e.g. utf8 or utf16).

The utf-8 console is fine for my personal development style, I'm
afraid it doesn't work for you. Whether your console is capable to
display Japanese characters or not, you will want to see Japanese
characters in hex-escaped characters, don't you?

>
> IMO it's important to trust that you didn't write garbage, unless you
> specifically asked for it.

Is this requested by users? With Python 2, we can always print strings
containing garbage without exceptions. Python 3 is much stricter in
this respect. To get meaningful information instead of tracebacks, we
need to know encoding of output device and characters to be printed
whenever we print strings. This is hard to be accomplished in
practice.

> PS> I couldn't get backslashescape to work -- is this just a proposal?

No. Works for me without any modifications. I tried with latest source form svn.

Python 3.0a5+ (py3k:63546, May 23 2008, 13:42:06) [MSC v.1500 32 bit (Intel)] on
 win32
>>> "????".encode("ascii", "backslashreplace")
b'\\u30d1\\u30a4\\u30bd\\u30f3'
[39364 refs]

From ncoghlan at gmail.com  Fri May 23 10:39:01 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 23 May 2008 18:39:01 +1000
Subject: [Python-3000] class style
In-Reply-To: <fa7d4c4f0805091334k12ac2ffbuf6ce86f9bcbed1aa@mail.gmail.com>
References: <fa7d4c4f0805091334k12ac2ffbuf6ce86f9bcbed1aa@mail.gmail.com>
Message-ID: <483682A5.8080103@gmail.com>

paul bedaride wrote:
> Hello,
> 
> I'm new on this list

The question you asked is more appropriate for comp.lang.python, not 
python-dev/python-3000 (which are about the development *of* Python, not 
development *with* Python).

Regards,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From stefan_ml at behnel.de  Fri May 23 12:44:00 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Fri, 23 May 2008 12:44:00 +0200
Subject: [Python-3000] Single buffer implied in new buffer protocol?
Message-ID: <g1675f$hf4$1@ger.gmane.org>

Hi,

while implementing Py_buffer support in Cython, I noticed (the hard way,
throught a segfault), that the buffer pointer passed into getbuffer() can be
NULL, e.g. when calling memoryview.tobytes(). According to PEP 3118 (first
paragraph below the getbuffer() signature), this implies setting a lock on the
memory. Funny enough, the LOCK flag wasn't even set in my case, I just get
NULL as buffer and 285 as flags...

Anyway, my point is that this part of the protocol actually implies setting a
lock on the buffer *provider* rather than the buffer itself, as the buffer
provider cannot distinguish between different buffers based on a NULL pointer.

I know, the protocol is overly complex already and hard to implement from a
provider perspective, and I understand that that was preferred over putting
the complexity into the consumer. But wouldn't it make more sense to *always*
pass the buffer pointer, to let the provider decide what it makes of the
flags? I can well imagine the case where a buffer provider chooses to return
different buffer pointers based on the WRITABLE flag, for example. In that
case, it would be unable to attribute the lock to any of the buffers.

Stefan


From guido at python.org  Fri May 23 16:22:00 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 May 2008 07:22:00 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<loom.20080522T115632-47@post.gmane.org> <483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
Message-ID: <ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>

On Fri, May 23, 2008 at 12:28 AM, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
> On Fri, May 23, 2008 at 1:30 PM, Guido van Rossum <guido at python.org> wrote:
>
>>> One point still remains is default error handler for sys.stdout. I can
>>> live with 'strict' error handler, but I think raising exceptions for
>>> evenry un-supported characters by default is too exacting.
>>
>> I think to avoid exceptions you should arrange for the encoding to be
>> capable of encoding all characters (e.g. utf8 or utf16).
>
> The utf-8 console is fine for my personal development style, I'm
> afraid it doesn't work for you. Whether your console is capable to
> display Japanese characters or not, you will want to see Japanese
> characters in hex-escaped characters, don't you?

Personally, I can live with it. I rarely generate Japanese text so I
doubt it'll be a problem. I can also change the console encoding and
error handler.

>> IMO it's important to trust that you didn't write garbage, unless you
>> specifically asked for it.
>
> Is this requested by users? With Python 2, we can always print strings
> containing garbage without exceptions. Python 3 is much stricter in
> this respect. To get meaningful information instead of tracebacks, we
> need to know encoding of output device and characters to be printed
> whenever we print strings. This is hard to be accomplished in
> practice.

Tracebacks should always go to stderr.

What I meant by "not writing garbage" was for some app that e.g. acts
like a filter or otherwise produces output (on stdout) for another
program to consume. The other program might not understand \u escapes.
I'd rather trap this when writing, not when reading the garbage
several stages later.

IOW:

- stderr (and probably also interactive stdout): set backslashreplace
- stdout (if not interactive): strict

Default encoding taken from environment in all cases.

>> PS> I couldn't get backslashescape to work -- is this just a proposal?
>
> No. Works for me without any modifications. I tried with latest source form svn.
>
> Python 3.0a5+ (py3k:63546, May 23 2008, 13:42:06) [MSC v.1500 32 bit (Intel)] on
>  win32
>>>> "????".encode("ascii", "backslashreplace")
> b'\\u30d1\\u30a4\\u30bd\\u30f3'
> [39364 refs]

Ah, backspashreplace, not backslashescape. :-)


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ishimoto at gembook.org  Fri May 23 17:05:37 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Sat, 24 May 2008 00:05:37 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
Message-ID: <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>

2008/5/23 Guido van Rossum <guido at python.org>:
> Personally, I can live with it. I rarely generate Japanese text so I
> doubt it'll be a problem. I can also change the console encoding and
> error handler.

While you rarely generate Japanese text, but I guess you often get
non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in
Spanish, etc. Forecasting encoding of data is hard in these days.

>
> What I meant by "not writing garbage" was for some app that e.g. acts
> like a filter or otherwise produces output (on stdout) for another
> program to consume. The other program might not understand \u escapes.
> I'd rather trap this when writing, not when reading the garbage
> several stages later.
>
> IOW:
>
> - stderr (and probably also interactive stdout): set backslashreplace
> - stdout (if not interactive): strict
>
> Default encoding taken from environment in all cases.

Fine with me. I'll update the PEP and patch. Thank you!

From stephen at xemacs.org  Fri May 23 22:42:54 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 24 May 2008 05:42:54 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<483566A7.6050106@egenix.com>
	<87k5hmxm2z.fsf@uwakimon.sk.tsukuba.ac.jp>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
Message-ID: <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>

Atsuo Ishimoto writes:
 > 2008/5/23 Guido van Rossum <guido at python.org>:
 > > Personally, I can live with it. I rarely generate Japanese text so I
 > > doubt it'll be a problem. I can also change the console encoding and
 > > error handler.
 > 
 > While you rarely generate Japanese text, but I guess you often get
 > non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in
 > Spanish, etc. Forecasting encoding of data is hard in these days.

I don't see the problem.  You don't have to forecast the encoding of
data.  Strings are Unicode in Python internal format.  The question is
whether the device receiving the output of repr can handle all of the
characters that will be generated.

From ishimoto at gembook.org  Sat May 24 04:04:48 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Sat, 24 May 2008 11:04:48 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>

On Sat, May 24, 2008 at 5:42 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Atsuo Ishimoto writes:
>  > 2008/5/23 Guido van Rossum <guido at python.org>:
>  > > Personally, I can live with it. I rarely generate Japanese text so I
>  > > doubt it'll be a problem. I can also change the console encoding and
>  > > error handler.
>  >
>  > While you rarely generate Japanese text, but I guess you often get
>  > non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in
>  > Spanish, etc. Forecasting encoding of data is hard in these days.
>
> I don't see the problem.  You don't have to forecast the encoding of
> data.  Strings are Unicode in Python internal format.  The question is
> whether the device receiving the output of repr can handle all of the
> characters that will be generated.
>

Yes. My question is "Which do you feel comfortable, printing collect
glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign
characters, but I had feeling that western people prefer hex-escaped
ASCII in general. But from responses I saw, perhaps this is not big
deal.

From guido at python.org  Sat May 24 07:01:11 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 23 May 2008 22:01:11 -0700
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
Message-ID: <ca471dc20805232201v14516d3sba7aab4272c35258@mail.gmail.com>

On Fri, May 23, 2008 at 7:04 PM, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
> On Sat, May 24, 2008 at 5:42 AM, Stephen J. Turnbull <stephen at xemacs.org> wrote:
>> Atsuo Ishimoto writes:
>>  > 2008/5/23 Guido van Rossum <guido at python.org>:
>>  > > Personally, I can live with it. I rarely generate Japanese text so I
>>  > > doubt it'll be a problem. I can also change the console encoding and
>>  > > error handler.
>>  >
>>  > While you rarely generate Japanese text, but I guess you often get
>>  > non-ASCII text data e.g. SPAM mail in Japanese, Rietveld comments in
>>  > Spanish, etc. Forecasting encoding of data is hard in these days.
>>
>> I don't see the problem.  You don't have to forecast the encoding of
>> data.  Strings are Unicode in Python internal format.  The question is
>> whether the device receiving the output of repr can handle all of the
>> characters that will be generated.
>
> Yes. My question is "Which do you feel comfortable, printing collect
> glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign
> characters, but I had feeling that western people prefer hex-escaped
> ASCII in general. But from responses I saw, perhaps this is not big
> deal.

I've certainly gotten over it, and have come to appreciate your point of view.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From tjreedy at udel.edu  Sat May 24 07:07:50 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 24 May 2008 01:07:50 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com><1211486349.5825.14.camel@fsol><ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com><797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com><ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com><797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com><ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com><797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com><87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
Message-ID: <g187r6$iet$1@ger.gmane.org>


"Atsuo Ishimoto" <ishimoto at gembook.org> wrote in message 
news:797440730805231904y501d310fw124ccd0e37defd3b at mail.gmail.com...
| Yes. My question is "Which do you feel comfortable, printing collect
| glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign
| characters, but I had feeling that western people prefer hex-escaped
| ASCII in general. But from responses I saw, perhaps this is not big
| deal.

Given that my system displays most major alphabets, and that I can 
recognize most, the glyphs are more informative for informal purposes than 
seemingly 'random' codes.


From ncoghlan at gmail.com  Sat May 24 09:33:19 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 24 May 2008 17:33:19 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <g187r6$iet$1@ger.gmane.org>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com><ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com><1211486349.5825.14.camel@fsol><ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com><797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com><ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com><797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com><ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com><797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com><87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<g187r6$iet$1@ger.gmane.org>
Message-ID: <4837C4BF.7070302@gmail.com>

Terry Reedy wrote:
> "Atsuo Ishimoto" <ishimoto at gembook.org> wrote in message 
> news:797440730805231904y501d310fw124ccd0e37defd3b at mail.gmail.com...
> | Yes. My question is "Which do you feel comfortable, printing collect
> | glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign
> | characters, but I had feeling that western people prefer hex-escaped
> | ASCII in general. But from responses I saw, perhaps this is not big
> | deal.
> 
> Given that my system displays most major alphabets, and that I can 
> recognize most, the glyphs are more informative for informal purposes than 
> seemingly 'random' codes.

The same goes for me - Konsole displays all sorts of Unicode glyphs just 
fine. I can actually read Japanese kana and the Cyrillic alphabet a heck 
of a lot better than I can read Unicode hex escapes, purely because the 
additional symbols are more distinctive than a relatively arbitrary 
collection of numbers :)

Cheers,
Nick.


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From stephen at xemacs.org  Sat May 24 11:37:11 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sat, 24 May 2008 18:37:11 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
Message-ID: <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>

Atsuo Ishimoto writes:

 > Yes. My question is "Which do you feel comfortable, printing collect
 > glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign
 > characters, but I had feeling that western people prefer hex-escaped
 > ASCII in general. But from responses I saw, perhaps this is not big
 > deal.

I think Americans, at least, tend to fear that non-ASCII will be
interpreted as terminal control sequences or highlighted annoyingly in
some way.  Otherwise, they might grumble about the fact that what
they're seeing isn't English, but it doesn't matter whether it's
hex-escaped or kanji.

From ishimoto at gembook.org  Sat May 24 12:49:34 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Sat, 24 May 2008 19:49:34 +0900
Subject: [Python-3000] UPDATED: PEP 3138- String representation in Python
	3000
Message-ID: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>

I updated a PEP 3138 - String representation in Python 3000.
Python wiki is also updated. (http://wiki.python.org/moin/Python3kStringRepr)

I would appreciate your comments and help.

-----------------------------------------------

PEP: 3138

Title: String representation in Python 3000
Version: $Revision$
Last-Modified: $Date$
Author: Atsuo Ishimoto <ishimoto--at--gembook.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created:  05-May-2008
Post-History:


Abstract
========

This PEP proposes new string representation form for Python 3000. In
Python prior to Python 3000, the ``repr()`` built-in function converts
arbitrary objects to printable ASCII strings for debugging and logging.
For Python 3000, a wider range of characters, based on the Unicode
standard, should be considered 'printable'.


Motivation
==========

The current ``repr()`` converts 8-bit strings to ASCII using following
algorithm.

- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
  characters(>=0x80) to '\\xXX'.

- Backslash-escape quote characters(apostrophe, ') and add the quote
  character at the beginning and the end.

For Unicode strings, the following additional conversions are done.

- Convert leading surrogate pair characters without trailing character
  (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.

- Convert 16-bit characters(>=0x100) to '\\uXXXX'.

- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
  '\\U00xxxxxx'.

This algorithm converts any string to printable ASCII, and ``repr()`` is
used as handy and safe way to print strings for debugging or for
logging. Although all non-ASCII characters are escaped, this does not
matter when most of the string's characters are ASCII. But for other
languages, such as Japanese where most characters in a string are not
ASCII, this is very inconvenient. Python 3000 has a lot of nice features
for non-Latin users such as non-ASCII identifiers, so it would be
helpful if Python could also progress in a similar way for printable
output.

Some users might be concerned that such output will mess up their
console if they print binary data like images. But this is unlikely to
happen in practice because bytes and strings are different types in
Python 3000, so printing an image to the console won't mess it up.

This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected.


Specification
=============

- Add Python API ``int PY_UNICODE_ISPRINTABLE(Py_UNICODE ch)``. ``
  PY_UNICODE_ISPRINTABLE()`` return 0 if ``repr()`` should escape the
  Unicode character ``ch``, 1 otherwise. Characters should be escaped are

  * Characters defined in the Unicode character database as "Other"(Cc,
    Cf, Cs, Co, Cn).

  * Characters defined in the Unicode character database as "Separator"
    (Zl, Zp, Zs) other than ASCII space(0x20).

- The algorithm to build ``repr()`` strings should be changed to:

  * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

  * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'.

  * Convert leading surrogate pair characters without trailing character
    (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.

  * Convert non-printable characters(PY_UNICODE_ISPRINTABLE() returns 0)
    to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.

  * Backslash-escape quote characters(apostrophe, ') and add quote
    character at the beginning and the end.

- Set the Unicode error-handler for sys.stderr to 'backslashreplace' by
  default.

- Set the Unicode error-handler for sys.stdout in the Python interactive
  session to 'backslashreplace' by default.

- Add ``'%a'`` string format operator. ``'%a'`` converts any python
  object to string using ``repr()`` and then hex-escape all non-ASCII
  characters. ``'%a'`` operator generates same string as ``'%r'`` in
  Python 2.

- Add ``ascii()`` builtin function. ``ascii()`` converts any python
  object to string using ``repr()`` and then hex-escape all non-ASCII
  characters. ``ascii()`` generates same string as ``repr()`` in Python 2.

- Add ``isprintable()`` method to the string type. ``str.isprintable()``
  return True if ``repr()`` should escape the characters in the string,
  False otherwise. ``isprintable()`` method calls
  ``PY_UNICODE_ISPRINTABLE()`` internally.


Rationale
=========

The ``repr()`` in Python 3000 should be Unicode not ASCII based, just
like Python 3000 strings. Also, conversion should not be affected by the
locale setting, because the locale is not necessarily the same as the
output device's locale. For example, it is common for a daemon process
to be invoked in an ASCII setting, but writes UTF-8 to its log files.
Also, web applications might want to report the error information in
more readable form based on the HTML page's encoding.

Characters not supported by user's console are hex-escaped on printing,
by the Unicode encoder's error-handler. If the error-handler of the
output file is 'backslashreplace', such characters are hex-escaped
without raising UnicodeEncodeError. For example, if your default
encoding is ASCII, ``print('Hello ?')`` will prints 'Hello \\xa2'.
If your encoding is ISO-8859-1, 'Hello ?' will be printed.

For non-interactive session, default error-handler of sys.stdout should
be default to 'strict'. Other applications reading the output might not
understand hex-escaped characters, so un-supported characters should be
trapped when writing.

Printable characters
--------------------

The Unicode standard doesn't define Non-printable characters, so we must
create our own definition. Here we propose to define Non-printable
characters as follows.

- Non-printable ASCII characters as Python 2.

- Broken surrogate pair characters.

- Characters defined in the Unicode character database as

  * Cc (Other, Control)
  * Cf (Other, Format)
  * Cs (Other, Surrogate)
  * Co (Other, Private Use)
  * Cn (Other, Not Assigned)
  * Zl Separator, Line ('\\u2028', LINE SEPARATOR)
  * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR)
  * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
    this category should be escaped to avoid ambiguity.

Alternate Solutions
-------------------

To help debugging in non-Latin languages without changing ``repr()``,
other suggestion were made.

- Supply a tool to print lists or dicts.

  Strings to be printed for debugging are not only contained by lists or
  dicts, but also in many other types of object. File objects contain a
  file name in Unicode, exception objects contain a message in Unicode,
  etc. These strings should be printed in readable form when repr()ed.
  It is unlikely to be possible to implement a tool to print all
  possible object types.

- Use sys.displayhook and sys.excepthook.

  For interactive sessions, we can write hooks to restore hex escaped
  characters to the original characters. But these hooks are called only
  when the result of evaluating an expression entered in an interactive
  Python session, and doesn't work for the print() function, for non-
  interactive sessions or for logging.debug("%r", ...), etc.

- Subclass sys.stdout and sys.stderr.

  It is difficult to implement a subclass to restore hex-escaped
  characters since there isn't enough information left by the time it's
  a string to undo the escaping correctly in all cases. For example, ``
  print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
  there is no chance to tell file objects apart.

- Make the encoding used by ``unicode_repr()`` adjustable, and make
  current ``repr()`` as default.

  With adjustable ``repr()``, result of ``repr()`` is unpredictable and
  would make impossible to write correct code involving ``repr()``. And
  if current ``repr()`` is default, then old convention remains intact
  and user may expect ASCII strings as the result of ``repr()``. Third
  party applications or libraries could be choked when custom ``repr()``
  function is used.


Backwards Compatibility
=======================

Changing ``repr()`` may break some existing codes, especially testing
code. Five of Python's regression test fail with this modification. If
you need ``repr()`` strings without non-ASCII character as Python 2, you
can use following function. ::

    def repr_ascii(obj):
        return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")

For logging or for debugging, following code can raise UnicodeEncodeError. ::

    log = open("logfile", "w")
    log.write(repr(data))     # UnicodeEncodeError will be raised
                              # if data contains unsupported characters.

To avoid exceptions raised, you can specify error-handler explicitly. ::

    log = open("logfile", "w", errors="backslashreplace")
    log.write(repr(data))  # Unsupported characters will be escaped.


For the console with Unicode-based encoding, for example, en_US.utf8 and
de_DE.utf8, the backslashescape trick doesn't work and all printable
characters are not escaped. This will cause a problem of similarly
drawing characters in Western,Greek and Cyrillic languages. These
languages use similar (but different) alphabets (descended from the
common ancestor) and contain letters that look similar but has different
character codes. For example, it is hard to distinguish Latin 'a', 'e'
and 'o' from Cyrillic '?', '?' and '?'. (The visual representation, of
course, very much depends on the fonts used but usually these letters
are almost indistinguishable.) To avoid the problem, user can adjust
terminal encoding to get desired result suitable for their environment
or use ``repr_ascii()`` described above.


Open Issues
===========

- Is ``ascii()`` function necessary, or documentation is just fine? If
  necessary, should ``ascii()`` belong to builtin namespace?


Rejected Proposals
==================

- Add encoding and errors arguments to the builtin print() function,
  with defaults of sys.getfilesystemencoding() and 'backslashreplace'.

  Complicated to implement, and in general, this is not seem to good
  idea. [2]_

- Use character names to escape characters, instead of hex character
  codes. For example, ``repr('\u03b1')`` can be converted to
  ``"\N{GREEK SMALL LETTER ALPHA}"``.

  Using character names get verbose compared to hex-escape. e.g., ``repr
  ("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH
  WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.


Reference Implementation
========================

http://bugs.python.org/issue2630


References
==========

.. [1] Multibyte string on string\::string_print
        (http://bugs.python.org/issue479898)

.. [2] [Python-3000] Displaying strings containing unicode escapes
        (http://mail.python.org/pipermail/python-3000/2008-April/013366.html)


Copyright
=========

This document has been placed in the public domain.

From jimjjewett at gmail.com  Sat May 24 18:53:08 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Sat, 24 May 2008 12:53:08 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>

On 5/24/08, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Atsuo Ishimoto writes:

>   > Yes. My question is "Which do you feel comfortable, printing collect
>   > glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign
>   > characters, but I had feeling that western people prefer hex-escaped
>   > ASCII in general. But from responses I saw, perhaps this is not big
>   > deal.

It depends on why I'm looking at it.  I do prefer hex for repr,
because hex is safer; if I want pretty, I'll use print (or pprint).

> I think Americans, at least, tend to fear that non-ASCII will be
>  interpreted as terminal control sequences or highlighted annoyingly in
>  some way.

Because it often is, even on systems that can display the proper
glyphs in other contexts -- and it isn't always possible to recover
from a messed-up terminal without restarting the session.  I'll grant
that this implies bugs in the programs I use -- but they happen enough
with enough different programs that it is a concern.

>  Otherwise, they might grumble about the fact that what
>  they're seeing isn't English, but it doesn't matter whether it's
>  hex-escaped or kanji.

I'm more worried that it might look like English, yet be subtly (and
importantly) different.  I can distinguish the characters in ASCII
pretty well, or at least recognize when something looks ambiguous.  I
cannot do that so well with other scripts -- but seeing a hex escape
warns me that something special is happening.

Note that I have no objection to properly displaying other characters
as a system-wide setting.  I'm glad that it is easy to do with print.

I just want it to be very easy to say "on my system, repr is ASCII".
I would prefer that ASCII also be the default, so that people who want
more characters opt in to receive them, at least once at installation
time.

-jJ

From phd at phd.pp.ru  Sat May 24 19:18:14 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sat, 24 May 2008 21:18:14 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
References: <ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
Message-ID: <20080524171814.GA4026@phd.pp.ru>

On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote:
> if I want pretty, I'll use print (or pprint).

   str(container_of_strings) uses repr(), so you loose prettiness on either
print or '%s' % container_of_strings. Exceptions use repr() for file names,
e.g., which is very inconvenient, IMHO.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From phd at phd.pp.ru  Sat May 24 19:27:21 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sat, 24 May 2008 21:27:21 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <20080524171814.GA4026@phd.pp.ru>
References: <797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
Message-ID: <20080524172721.GC4026@phd.pp.ru>

On Sat, May 24, 2008 at 09:18:14PM +0400, Oleg Broytmann wrote:
> On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote:
> > if I want pretty, I'll use print (or pprint).
> 
>    str(container_of_strings) uses repr(), so you loose prettiness on either
> print or '%s' % container_of_strings. Exceptions use repr() for file names,
> e.g., which is very inconvenient, IMHO.

   I meant - you cannot print() an exception to make it pretty - it uses
repr() internally anyway. The only way to win back the prettiness is to
make repr() prints printable strings without encoding.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From janssen at parc.com  Sat May 24 20:47:55 2008
From: janssen at parc.com (Bill Janssen)
Date: Sat, 24 May 2008 11:47:55 PDT
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp> 
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <08May24.114756pdt."58698"@synergy1.parc.xerox.com>

> Atsuo Ishimoto writes:
> 
>  > Yes. My question is "Which do you feel comfortable, printing collect
>  > glyphs or hex-escaped ASCII ?". I prefer printed glyphs for foreign
>  > characters, but I had feeling that western people prefer hex-escaped
>  > ASCII in general. But from responses I saw, perhaps this is not big
>  > deal.
> 
> I think Americans, at least, tend to fear that non-ASCII will be
> interpreted as terminal control sequences or highlighted annoyingly in
> some way.  Otherwise, they might grumble about the fact that what
> they're seeing isn't English, but it doesn't matter whether it's
> hex-escaped or kanji.

The nice thing about hex-escaped characters is that I can look up the
character code to find out what the character is.  Hard to do that
with a glyph that I don't recognize.

Bill


From martin at v.loewis.de  Sat May 24 21:20:57 2008
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 24 May 2008 21:20:57 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <08May24.114756pdt."58698"@synergy1.parc.xerox.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>	<1211486349.5825.14.camel@fsol>	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<08May24.114756pdt."58698"@synergy1.parc.xerox.com>
Message-ID: <48386A99.1000800@v.loewis.de>

> The nice thing about hex-escaped characters is that I can look up the
> character code to find out what the character is.  Hard to do that
> with a glyph that I don't recognize.

Not that difficult. Suppose I have the character ?, I just do

py> unicodedata.name(u"?")
'CYRILLIC CAPITAL LETTER SCHWA'

I used cut-n-paste to insert the character into the interactive prompt;
that worked just fine.

Regards,
Martin

From tjreedy at udel.edu  Sat May 24 22:15:28 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sat, 24 May 2008 16:15:28 -0400
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
	Python3000
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
Message-ID: <g19t10$t1v$1@ger.gmane.org>


|
| - Add ``isprintable()`` method to the string type. ``str.isprintable()``
|  return True if ``repr()`` should escape the characters in the string,
|  False otherwise.

Is not this backwards?  Isprintable to me mean should *not* escape. 


From janssen at parc.com  Sun May 25 00:26:49 2008
From: janssen at parc.com (Bill Janssen)
Date: Sat, 24 May 2008 15:26:49 PDT
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <48386A99.1000800@v.loewis.de> 
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805221055j52594fd2offb7fa3fcf936629@mail.gmail.com>
	<1211486349.5825.14.camel@fsol>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<08May24.114756pdt."58698"@synergy1.parc.xerox.com>
	<48386A99.1000800@v.loewis.de>
Message-ID: <08May24.152651pdt."58698"@synergy1.parc.xerox.com>

> Not that difficult. Suppose I have the character ??, I just do
> 
> py> unicodedata.name(u"??")
> 'CYRILLIC CAPITAL LETTER SCHWA'
> 
> I used cut-n-paste to insert the character into the interactive prompt;
> that worked just fine.

I suppose, if I knew about unicodedata.name(), and if my cursed
command-line terminal supported cut-and-paste.  Between rxvt, xterm,
Emacs shell buffers, Windows command shells, and OS X Terminal.app
windows, I find it hard to know just what will and will not work in
that regard.

Bill


From ncoghlan at gmail.com  Sun May 25 02:45:30 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 25 May 2008 10:45:30 +1000
Subject: [Python-3000] UPDATED: PEP 3138- String representation
	in	Python3000
In-Reply-To: <g19t10$t1v$1@ger.gmane.org>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<g19t10$t1v$1@ger.gmane.org>
Message-ID: <4838B6AA.5000207@gmail.com>

Terry Reedy wrote:
> |
> | - Add ``isprintable()`` method to the string type. ``str.isprintable()``
> |  return True if ``repr()`` should escape the characters in the string,
> |  False otherwise.
> 
> Is not this backwards?  Isprintable to me mean should *not* escape.

I agree (I suspect the incorrect phrasing is due to the fact that this 
query method used to ask the opposite question - then the name got 
changed without updating the description)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ishimoto at gembook.org  Sun May 25 06:10:52 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Sun, 25 May 2008 13:10:52 +0900
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
	Python3000
In-Reply-To: <4838B6AA.5000207@gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<g19t10$t1v$1@ger.gmane.org> <4838B6AA.5000207@gmail.com>
Message-ID: <797440730805242110r3cccc6f7p80ace3888c3b5336@mail.gmail.com>

On Sun, May 25, 2008 at 9:45 AM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Terry Reedy wrote:
>>
>> |
>> | - Add ``isprintable()`` method to the string type. ``str.isprintable()``
>> |  return True if ``repr()`` should escape the characters in the string,
>> |  False otherwise.
>>
>> Is not this backwards?  Isprintable to me mean should *not* escape.
>
> I agree (I suspect the incorrect phrasing is due to the fact that this query
> method used to ask the opposite question - then the name got changed without
> updating the description)

Dang, I'm sorry for dumb mistake. Your suspection is right:).

I updated the PEP, with some addition to motivation section as per
Oleg's advice.

-----------------------------------------------

PEP: 3138

Title: String representation in Python 3000
Version: $Revision$
Last-Modified: $Date$
Author: Atsuo Ishimoto <ishimoto--at--gembook.org>
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created:  05-May-2008
Post-History:


Abstract
========

This PEP proposes new string representation form for Python 3000. In
Python prior to Python 3000, the ``repr()`` built-in function converts
arbitrary objects to printable ASCII strings for debugging and logging.
For Python 3000, a wider range of characters, based on the Unicode
standard, should be considered 'printable'.


Motivation
==========

The current ``repr()`` converts 8-bit strings to ASCII using following
algorithm.

- Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

- Convert other non-printable characters(0x00-0x1f, 0x7f) and non-ASCII
  characters(>=0x80) to '\\xXX'.

- Backslash-escape quote characters(apostrophe, ') and add the quote
  character at the beginning and the end.

For Unicode strings, the following additional conversions are done.

- Convert leading surrogate pair characters without trailing character
  (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.

- Convert 16-bit characters(>=0x100) to '\\uXXXX'.

- Convert 21-bit characters(>=0x10000) and surrogate pair characters to
  '\\U00xxxxxx'.

This algorithm converts any string to printable ASCII, and ``repr()`` is
used as handy and safe way to print strings for debugging or for
logging. Although all non-ASCII characters are escaped, this does not
matter when most of the string's characters are ASCII. But for other
languages, such as Japanese where most characters in a string are not
ASCII, this is very inconvenient.

we can use ``print(aJapaneseString)`` to get readable string, but we don't
have workaround to read strings in containers such as list or tuple.
``print(listOfJapaneseStrings)`` uses repr() to build the string to be
printed, so resulting strings are always hex-escaped. Or when
``open(japaneseFilemame)`` raises an exception, the error message is
something like ``IOError: [Errno 2] No such file or directory:
'\u65e5\u672c\u8a9e'``, which isn't helpful.

Python 3000 has a lot of nice features for non-Latin users such as
non-ASCII identifiers, so it would be helpful if Python could also
progress in a similar way for printable output.

Some users might be concerned that such output will mess up their
console if they print binary data like images. But this is unlikely to
happen in practice because bytes and strings are different types in
Python 3000, so printing an image to the console won't mess it up.

This issue was once discussed by Hye-Shik Chang [1]_ , but was rejected.


Specification
=============

- Add Python API ``int PY_UNICODE_ISPRINTABLE(Py_UNICODE ch)``. ``
  PY_UNICODE_ISPRINTABLE()`` return 0 if ``repr()`` should escape the
  Unicode character ``ch``, 1 otherwise. Characters should be escaped are

  * Characters defined in the Unicode character database as "Other"(Cc,
    Cf, Cs, Co, Cn).

  * Characters defined in the Unicode character database as "Separator"
    (Zl, Zp, Zs) other than ASCII space(0x20).

- The algorithm to build ``repr()`` strings should be changed to:

  * Convert CR, LF, TAB and '\\' to '\\r', '\\n', '\\t', '\\\\'.

  * Convert non-printable ASCII characters(0x00-0x1f, 0x7f) to '\\xXX'.

  * Convert leading surrogate pair characters without trailing character
    (0xd800-0xdbff, but not followed by 0xdc00-0xdfff) to '\\uXXXX'.

  * Convert non-printable characters(PY_UNICODE_ISPRINTABLE() returns 0)
    to 'xXX', '\\uXXXX' or '\\U00xxxxxx'.

  * Backslash-escape quote characters(apostrophe, ') and add quote
    character at the beginning and the end.

- Set the Unicode error-handler for sys.stderr to 'backslashreplace' by
  default.

- Set the Unicode error-handler for sys.stdout in the Python interactive
  session to 'backslashreplace' by default.

- Add ``'%a'`` string format operator. ``'%a'`` converts any python
  object to string using ``repr()`` and then hex-escape all non-ASCII
  characters. ``'%a'`` operator generates same string as ``'%r'`` in
  Python 2.

- Add ``ascii()`` builtin function. ``ascii()`` converts any python
  object to string using ``repr()`` and then hex-escape all non-ASCII
  characters. ``ascii()`` generates same string as ``repr()`` in Python 2.

- Add ``isprintable()`` method to the string type. ``str.isprintable()``
  return False if ``repr()`` should escape the characters in the string,
  True otherwise. ``isprintable()`` method calls
  ``PY_UNICODE_ISPRINTABLE()`` internally.


Rationale
=========

The ``repr()`` in Python 3000 should be Unicode not ASCII based, just
like Python 3000 strings. Also, conversion should not be affected by the
locale setting, because the locale is not necessarily the same as the
output device's locale. For example, it is common for a daemon process
to be invoked in an ASCII setting, but writes UTF-8 to its log files.
Also, web applications might want to report the error information in
more readable form based on the HTML page's encoding.

Characters not supported by user's console are hex-escaped on printing,
by the Unicode encoder's error-handler. If the error-handler of the
output file is 'backslashreplace', such characters are hex-escaped
without raising UnicodeEncodeError. For example, if your default
encoding is ASCII, ``print('Hello ?')`` will prints 'Hello \\xa2'.
If your encoding is ISO-8859-1, 'Hello ?' will be printed.

For non-interactive session, default error-handler of sys.stdout should
be default to 'strict'. Other applications reading the output might not
understand hex-escaped characters, so un-supported characters should be
trapped when writing.

Printable characters
--------------------

The Unicode standard doesn't define Non-printable characters, so we must
create our own definition. Here we propose to define Non-printable
characters as follows.

- Non-printable ASCII characters as Python 2.

- Broken surrogate pair characters.

- Characters defined in the Unicode character database as

  * Cc (Other, Control)
  * Cf (Other, Format)
  * Cs (Other, Surrogate)
  * Co (Other, Private Use)
  * Cn (Other, Not Assigned)
  * Zl Separator, Line ('\\u2028', LINE SEPARATOR)
  * Zp Separator, Paragraph ('\\u2029', PARAGRAPH SEPARATOR)
  * Zs (Separator, Space) other than ASCII space('\\x20'). Characters in
    this category should be escaped to avoid ambiguity.

Alternate Solutions
-------------------

To help debugging in non-Latin languages without changing ``repr()``,
other suggestion were made.

- Supply a tool to print lists or dicts.

  Strings to be printed for debugging are not only contained by lists or
  dicts, but also in many other types of object. File objects contain a
  file name in Unicode, exception objects contain a message in Unicode,
  etc. These strings should be printed in readable form when repr()ed.
  It is unlikely to be possible to implement a tool to print all
  possible object types.

- Use sys.displayhook and sys.excepthook.

  For interactive sessions, we can write hooks to restore hex escaped
  characters to the original characters. But these hooks are called only
  when the result of evaluating an expression entered in an interactive
  Python session, and doesn't work for the print() function, for non-
  interactive sessions or for logging.debug("%r", ...), etc.

- Subclass sys.stdout and sys.stderr.

  It is difficult to implement a subclass to restore hex-escaped
  characters since there isn't enough information left by the time it's
  a string to undo the escaping correctly in all cases. For example, ``
  print("\\"+"u0041")`` should be printed as '\\u0041', not 'A'. But
  there is no chance to tell file objects apart.

- Make the encoding used by ``unicode_repr()`` adjustable, and make
  current ``repr()`` as default.

  With adjustable ``repr()``, result of ``repr()`` is unpredictable and
  would make impossible to write correct code involving ``repr()``. And
  if current ``repr()`` is default, then old convention remains intact
  and user may expect ASCII strings as the result of ``repr()``. Third
  party applications or libraries could be choked when custom ``repr()``
  function is used.


Backwards Compatibility
=======================

Changing ``repr()`` may break some existing codes, especially testing
code. Five of Python's regression test fail with this modification. If
you need ``repr()`` strings without non-ASCII character as Python 2, you
can use following function. ::

    def repr_ascii(obj):
        return str(repr(obj).encode("ASCII", "backslashreplace"), "ASCII")

For logging or for debugging, following code can raise UnicodeEncodeError. ::

    log = open("logfile", "w")
    log.write(repr(data))     # UnicodeEncodeError will be raised
                              # if data contains unsupported characters.

To avoid exceptions raised, you can specify error-handler explicitly. ::

    log = open("logfile", "w", errors="backslashreplace")
    log.write(repr(data))  # Unsupported characters will be escaped.


For the console with Unicode-based encoding, for example, en_US.utf8 and
de_DE.utf8, the backslashescape trick doesn't work and all printable
characters are not escaped. This will cause a problem of similarly
drawing characters in Western,Greek and Cyrillic languages. These
languages use similar (but different) alphabets (descended from the
common ancestor) and contain letters that look similar but has different
character codes. For example, it is hard to distinguish Latin 'a', 'e'
and 'o' from Cyrillic '?', '?' and '?'. (The visual representation, of
course, very much depends on the fonts used but usually these letters
are almost indistinguishable.) To avoid the problem, user can adjust
terminal encoding to get desired result suitable for their environment
or use ``repr_ascii()`` described above.


Open Issues
===========

- Is ``ascii()`` function necessary, or documentation is just fine? If
  necessary, should ``ascii()`` belong to builtin namespace?


Rejected Proposals
==================

- Add encoding and errors arguments to the builtin print() function,
  with defaults of sys.getfilesystemencoding() and 'backslashreplace'.

  Complicated to implement, and in general, this is not seem to good
  idea. [2]_

- Use character names to escape characters, instead of hex character
  codes. For example, ``repr('\u03b1')`` can be converted to
  ``"\N{GREEK SMALL LETTER ALPHA}"``.

  Using character names get verbose compared to hex-escape. e.g., ``repr
  ("\ufbf9")`` is converted to ``"\N{ARABIC LIGATURE UIGHUR KIRGHIZ YEH
  WITH HAMZA ABOVE WITH ALEF MAKSURA ISOLATED FORM}"``.


Reference Implementation
========================

http://bugs.python.org/issue2630


References
==========

.. [1] Multibyte string on string\::string_print
        (http://bugs.python.org/issue479898)

.. [2] [Python-3000] Displaying strings containing unicode escapes
        (http://mail.python.org/pipermail/python-3000/2008-April/013366.html)


Copyright
=========

This document has been placed in the public domain.

From stephen at xemacs.org  Sun May 25 10:03:19 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Sun, 25 May 2008 17:03:19 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<797440730805221846g1544e09di7c01434ef62933f5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
Message-ID: <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>

Jim Jewett writes:

 > >  Otherwise, they might grumble about the fact that what
 > >  they're seeing isn't English, but it doesn't matter whether it's
 > >  hex-escaped or kanji.
 > 
 > I'm more worried that it might look like English, yet be subtly (and
 > importantly) different.

Let me remind you that I advocated that position, and (1) Martin shot
me down hard, and (2) Guido indicated that it is a point, but he now
seems happy enough not to worry about it.  If you're serious about
that, you need to pick up the ball; I'm not comfortable advocating it,
especially in view of the wide variety of cases where it seems to be
used for something other than diagnosing normally invisible features
of output.

 > I just want it to be very easy to say "on my system, repr is ASCII".

That is in all proposals.

 > I would prefer that ASCII also be the default, so that people who want
 > more characters opt in to receive them,

Well, the people who want more characters include all non-Americans
and some large fraction of Americans.  I don't see how "Better Is
Better" can possibly beat "Worse Is Better" here, given the extent to
which repr is used to produce output meaningful to end-users
(vs. diagnostics for application and/or Python maintainers).

From lists at cheimes.de  Sun May 25 16:59:24 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 25 May 2008 16:59:24 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
Message-ID: <48397ECC.9070805@cheimes.de>

Hello!

The first set of betas of Python 2.6 and 3.0 is fast apace. I like to
grab the final chance and clean up the C API of 2.6 and 3.0. I know, I
know, I brought up the topic two times in the past. But this time I mean
it for real! :]

Last time Guido said:
---
I think it can actually be simplified. I think maintaining binary
compatibility between 2.6 and earlier versions is hopeless anyway, so
we might as well just rename PyString to PyBytes in 2.6 and 3.0, and
have an extra set of macros so that code using PyString needs to be
recompiled but not otherwise touched. E.g.

typedef { ... } PyBytesObject;
#define PyStringObject PyBytesObject

... PyString_Type;
#define PyBytes_Type PyString_Type

<etc>
---

I like to follow Guido's advice and change the code as following:

 * replace PyBytes_ with PyByteArray_
 * replace PyString with PyBytes_
 * rename bytesobject.[ch] to bytearrayobject.[ch]
 * rename stringobject.[ch] to bytesobject.[ch]
 * add a new file stringobject.h which contains the aliases PyString_ ->
PyBytes_

Christian

From lists at cheimes.de  Sun May 25 17:28:53 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 25 May 2008 17:28:53 +0200
Subject: [Python-3000] Please svnmerge your changes
Message-ID: <483985B5.6020705@cheimes.de>

Hello fellow developers!

I've been busy with personal work in the past weeks. At present I'm
still moving into my new apartment. It has been a real challenge to
install an IKEA kitchen in a house built before WW2 all by myself. On
the one hand it's fun but on the other hand it costs me most of my free
time at night. At least this building has a shelter in its cellar so I'm
mostly protected in the case of an air strike. *g*

In order to get all code merged before the first betas I need your help.
Please everybody grab a couple of your checkins and merge them yourself.

You can find the list of required merges at http://rafb.net/p/cghbTk63.html

Christian

From stefan_ml at behnel.de  Sun May 25 17:56:29 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sun, 25 May 2008 17:56:29 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <48397ECC.9070805@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
Message-ID: <g1c27c$nt2$1@ger.gmane.org>

Hi,

Christian Heimes wrote:
>  * add a new file stringobject.h which contains the aliases PyString_ ->
> PyBytes_

will that be included by Python.h by default?

Stefan


From lists at cheimes.de  Sun May 25 18:16:52 2008
From: lists at cheimes.de (Christian Heimes)
Date: Sun, 25 May 2008 18:16:52 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <g1c27c$nt2$1@ger.gmane.org>
References: <48397ECC.9070805@cheimes.de> <g1c27c$nt2$1@ger.gmane.org>
Message-ID: <483990F4.30802@cheimes.de>

Stefan Behnel schrieb:
> will that be included by Python.h by default?

Only in Python 2.6

Christian

From g.brandl at gmx.net  Sun May 25 21:21:26 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sun, 25 May 2008 21:21:26 +0200
Subject: [Python-3000] dbm package creation
Message-ID: <g1ce83$rb6$1@ger.gmane.org>

Hi,

I'll handle the PEP 3108 dbm package if nobody else is already at it.

Two questions though:

* the whichdb() function returns strings that are module names.  These
   names won't be importable anymore in 3k.  Should the return values
   remain the same in 3k, or should whichdb() return the new names, and
   if the latter, including "dbm." or not?

* two of the previous modules are C modules, namely dbm and gdbm.  They
   can't be easily moved into the package.  I expect the solution is to
   create stub Python modules and rename the C modules with a leading
   underscore? (It's already like this for bsd, except that the C module
   name, bsddb, has no underscore.)

cheers,
Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From brett at python.org  Mon May 26 00:02:32 2008
From: brett at python.org (Brett Cannon)
Date: Sun, 25 May 2008 15:02:32 -0700
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <48397ECC.9070805@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
Message-ID: <bbaeab100805251502w52f12c9fl5c0328d1964d6659@mail.gmail.com>

On Sun, May 25, 2008 at 7:59 AM, Christian Heimes <lists at cheimes.de> wrote:
> Hello!
>
> The first set of betas of Python 2.6 and 3.0 is fast apace. I like to
> grab the final chance and clean up the C API of 2.6 and 3.0. I know, I
> know, I brought up the topic two times in the past. But this time I mean
> it for real! :]
>
> Last time Guido said:
> ---
> I think it can actually be simplified. I think maintaining binary
> compatibility between 2.6 and earlier versions is hopeless anyway, so
> we might as well just rename PyString to PyBytes in 2.6 and 3.0, and
> have an extra set of macros so that code using PyString needs to be
> recompiled but not otherwise touched. E.g.
>
> typedef { ... } PyBytesObject;
> #define PyStringObject PyBytesObject
>
> ... PyString_Type;
> #define PyBytes_Type PyString_Type
>
> <etc>
> ---
>
> I like to follow Guido's advice and change the code as following:
>
>  * replace PyBytes_ with PyByteArray_
>  * replace PyString with PyBytes_
>  * rename bytesobject.[ch] to bytearrayobject.[ch]
>  * rename stringobject.[ch] to bytesobject.[ch]
>  * add a new file stringobject.h which contains the aliases PyString_ ->
> PyBytes_

+1 from me.

-Brett

From brett at python.org  Mon May 26 00:04:32 2008
From: brett at python.org (Brett Cannon)
Date: Sun, 25 May 2008 15:04:32 -0700
Subject: [Python-3000] Please svnmerge your changes
In-Reply-To: <483985B5.6020705@cheimes.de>
References: <483985B5.6020705@cheimes.de>
Message-ID: <bbaeab100805251504p75f29d7fu23ee389c3e95b983@mail.gmail.com>

On Sun, May 25, 2008 at 8:28 AM, Christian Heimes <lists at cheimes.de> wrote:
> Hello fellow developers!
>
> I've been busy with personal work in the past weeks. At present I'm
> still moving into my new apartment. It has been a real challenge to
> install an IKEA kitchen in a house built before WW2 all by myself. On
> the one hand it's fun but on the other hand it costs me most of my free
> time at night. At least this building has a shelter in its cellar so I'm
> mostly protected in the case of an air strike. *g*
>
> In order to get all code merged before the first betas I need your help.
> Please everybody grab a couple of your checkins and merge them yourself.
>
> You can find the list of required merges at http://rafb.net/p/cghbTk63.html

For stuff from the sandbox, would it help at all to block them
explicitly even though they shouldn't get merged at all?

-Brett

From brett at python.org  Mon May 26 00:08:34 2008
From: brett at python.org (Brett Cannon)
Date: Sun, 25 May 2008 15:08:34 -0700
Subject: [Python-3000] dbm package creation
In-Reply-To: <g1ce83$rb6$1@ger.gmane.org>
References: <g1ce83$rb6$1@ger.gmane.org>
Message-ID: <bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>

On Sun, May 25, 2008 at 12:21 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Hi,
>
> I'll handle the PEP 3108 dbm package if nobody else is already at it.
>

I know I have not started the work.

> Two questions though:
>
> * the whichdb() function returns strings that are module names.  These
>  names won't be importable anymore in 3k.  Should the return values
>  remain the same in 3k, or should whichdb() return the new names, and
>  if the latter, including "dbm." or not?
>

New names with the package name prepended.

Should probably change the API at some point to just return the module
to use instead of the name.

> * two of the previous modules are C modules, namely dbm and gdbm.  They
>  can't be easily moved into the package.  I expect the solution is to
>  create stub Python modules and rename the C modules with a leading
>  underscore? (It's already like this for bsd, except that the C module
>  name, bsddb, has no underscore.)
>

Yep. I don't know of any package in the stdlib that uses a extension
module in some other fashion.

-Brett

From humitos at gmail.com  Mon May 26 01:41:41 2008
From: humitos at gmail.com (Manuel Kaufmann)
Date: Sun, 25 May 2008 20:41:41 -0300
Subject: [Python-3000] Hello World!
Message-ID: <200805252041.41477.humitos@gmail.com>

Hi, I'm Manuel Kaufmann (aka humitos) and I'm student of "System Engenieer" in 
Santa F? (Capital) Argentina.

I met Python in the "1? Jornadas de Python en Santa F?" three years ago, and 
I'm happy with it. How can you see I don't speek english very well but I try 
to make undertand me.

"Saludos!"

-- 
Kaufmann Manuel
Blog: http://humitos.wordpress.com/
PyAr: http://www.python.com.ar/

From tjreedy at udel.edu  Mon May 26 02:22:43 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 25 May 2008 20:22:43 -0400
Subject: [Python-3000] Hello World!
References: <200805252041.41477.humitos@gmail.com>
Message-ID: <g1cvsi$7so$1@ger.gmane.org>


"Manuel Kaufmann" <humitos at gmail.com> wrote in message 
news:200805252041.41477.humitos at gmail.com...
| Hi, I'm Manuel Kaufmann (aka humitos) and I'm student of "System 
Engenieer" in
| Santa F? (Capital) Argentina.
|
| I met Python in the "1? Jornadas de Python en Santa F?" three years ago, 
and
| I'm happy with it. How can you see I don't speek english very well but I 
try
| to make undertand me.

Hi Manuel,
This list (Python 3000 devel) is for discussion of development of a 
*future* version of Python, primarily by the developers of that version.

For general discussion of Python, please address python-list or 
comp.lang.python.


From humitos at gmail.com  Mon May 26 02:34:45 2008
From: humitos at gmail.com (Manuel Kaufmann)
Date: Sun, 25 May 2008 21:34:45 -0300
Subject: [Python-3000] Hello World!
In-Reply-To: <g1cvsi$7so$1@ger.gmane.org>
References: <200805252041.41477.humitos@gmail.com> <g1cvsi$7so$1@ger.gmane.org>
Message-ID: <200805252134.46034.humitos@gmail.com>

El Sunday 25 May 2008 21:22:43 Terry Reedy escribi?:
> Hi Manuel,
> This list (Python 3000 devel) is for discussion of development of a
> *future* version of Python, primarily by the developers of that version.

Yes, I know that. I subscribed to it because I want to discuss about this 
issue[1] which I make a patch for that but I don't sure if it's correct or 
not.

[1] http://bugs.python.org/issue2888

-- 
Kaufmann Manuel
Blog: http://humitos.wordpress.com/
PyAr: http://www.python.com.ar/

From humitos at gmail.com  Mon May 26 01:44:47 2008
From: humitos at gmail.com (Manuel Kaufmann)
Date: Sun, 25 May 2008 20:44:47 -0300
Subject: [Python-3000] Hello World!
Message-ID: <200805252044.47192.humitos@gmail.com>

Hi, I'm Manuel Kaufmann (aka humitos) and I'm student of "System Engenieer" in 
Santa F? (Capital) Argentina.

I met Python in the "1? Jornadas de Python en Santa F?" three years ago, and 
I'm happy with it. How can you see I don't speek english very well but I try 
to make undertand me.

"Saludos!"

-- 
Kaufmann Manuel
Blog: http://humitos.wordpress.com/
PyAr: http://www.python.com.ar/

From tjreedy at udel.edu  Mon May 26 03:54:49 2008
From: tjreedy at udel.edu (Terry Reedy)
Date: Sun, 25 May 2008 21:54:49 -0400
Subject: [Python-3000] Hello World!
References: <200805252041.41477.humitos@gmail.com> <g1cvsi$7so$1@ger.gmane.org>
	<200805252134.46034.humitos@gmail.com>
Message-ID: <g1d598$i75$1@ger.gmane.org>


"Manuel Kaufmann" <humitos at gmail.com> wrote in message 
news:200805252134.46034.humitos at gmail.com...
| El Sunday 25 May 2008 21:22:43 Terry Reedy escribi?:
| > Hi Manuel,
| > This list (Python 3000 devel) is for discussion of development of a
| > *future* version of Python, primarily by the developers of that 
version.
|
| Yes, I know that. I subscribed to it because I want to discuss about this
| issue[1] which I make a patch for that but I don't sure if it's correct 
or
| not.
|
| [1] http://bugs.python.org/issue2888

In that case, have more patience and give the tracker discussion process 
more time. 


From humitos at gmail.com  Mon May 26 06:01:07 2008
From: humitos at gmail.com (Manuel Kaufmann)
Date: Mon, 26 May 2008 01:01:07 -0300
Subject: [Python-3000] Hello World!
In-Reply-To: <g1d598$i75$1@ger.gmane.org>
References: <200805252041.41477.humitos@gmail.com>
	<200805252134.46034.humitos@gmail.com> <g1d598$i75$1@ger.gmane.org>
Message-ID: <200805260101.07655.humitos@gmail.com>

El Sunday 25 May 2008 22:54:49 Terry Reedy escribi?:
> In that case, have more patience and give the tracker discussion process
> more time.

Sorry, I didn't explain myself.

I want to discuss "How to should work pprint.pprint?" or "Why py3k not works 
according to documentation?[1]"

I prefer the new way of showing it. Some times the old style don't like me. 
Example (in py2.6):
>>> import pprint
>>> stuff = [1,2,3]
>>> pprint.pprint(stuff, indent=4)
[   1, 2, 3]
>>> stuff.insert(0, stuff[:])
>>> pprint.pprint(stuff, indent=4)
[   [   1, 2, 3], 1, 2, 3]
>>> stuff.insert(0, stuff[:])
>>> pprint.pprint(stuff, indent=4)
[   [   [   1, 2, 3], 1, 2, 3], [   1, 2, 3], 1, 2, 3]

I prefer this one (in py3k):
>>> import pprint
>>> stuff = [1,2,3]
>>> stuff.insert(0, stuff[:])
>>> pprint.pprint(stuff, indent=4)
[[1, 2, 3], 1, 2, 3]
>>> stuff.insert(0, stuff[:])
>>> pprint.pprint(stuff, indent=4)
[[[1, 2, 3], 1, 2, 3], [1, 2, 3], 1, 2, 3]
>>> 

Now, if py3k is working fine, the documentation should be fix.

-- 
Kaufmann Manuel
Blog: http://humitos.wordpress.com/
PyAr: http://www.python.com.ar/

From g.brandl at gmx.net  Mon May 26 11:14:06 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 26 May 2008 11:14:06 +0200
Subject: [Python-3000] dbm package creation
In-Reply-To: <bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
References: <g1ce83$rb6$1@ger.gmane.org>
	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
Message-ID: <g1duqj$eqe$1@ger.gmane.org>

Brett Cannon schrieb:
> On Sun, May 25, 2008 at 12:21 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>> Hi,
>>
>> I'll handle the PEP 3108 dbm package if nobody else is already at it.
>>
> 
> I know I have not started the work.
> 
>> Two questions though:
>>
>> * the whichdb() function returns strings that are module names.  These
>>  names won't be importable anymore in 3k.  Should the return values
>>  remain the same in 3k, or should whichdb() return the new names, and
>>  if the latter, including "dbm." or not?
>>
> 
> New names with the package name prepended.
> 
> Should probably change the API at some point to just return the module
> to use instead of the name.
> 
>> * two of the previous modules are C modules, namely dbm and gdbm.  They
>>  can't be easily moved into the package.  I expect the solution is to
>>  create stub Python modules and rename the C modules with a leading
>>  underscore? (It's already like this for bsd, except that the C module
>>  name, bsddb, has no underscore.)
>>
> 
> Yep. I don't know of any package in the stdlib that uses a extension
> module in some other fashion.

Okay, that's settled then!

Georg


From solipsis at pitrou.net  Mon May 26 11:42:35 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 26 May 2008 09:42:35 +0000 (UTC)
Subject: [Python-3000] Exception re-raising woes
Message-ID: <loom.20080526T091655-961@post.gmane.org>


Hello all,

Trying to fix #2507 (Exception state lives too long in 3.0) has uncovered new
issues with the bare "raise" statement when in used in exception block nesting
situations (see #2833: __exit__ silences the active exception). I say
"uncovered" rather than "crated" since, as Amaury points out in the latter bug
entry, re-raising behaviour has always been a bit limited or non-obvious.

Witness the following code:

   try:
      raise Exception("foo")
   except Exception:
      try: raise KeyError("caught")
      except KeyError: pass
      raise

With python 2.x and py3k pre-r62847, it would re-raise KeyError("caught")
(whereas the intuitive behaviour would be to re-raise Exception("foo")).
With py3k post-r62847, it now raises a "RuntimeError: No active
exception to reraise".

Note that in py3k at least, we can get the "correct" behaviour by writing
instead:

   try:
      raise Exception("foo")
   except Exception as e:
      try: raise KeyError("caught")
      except KeyError: pass
      raise e

The only slight annoyance being that the re-raising statement ("raise e") is
added at the end of the original traceback.

There are other funny situations. Just try (with any Python version):

def except_yield():
    try:
        raise Exception("foo")
    except:
        yield 1
        raise
list(except_yield())


The problem with properly fixing the bare "raise" statement is that right now,
the saved exception state is a member of the frame object. That is, there is no
proper stacking of exception states when some lexically nested exception
handlers are involved in the same frame.

Now perhaps it is time to think about fixing that problem, without losing the
expected properties of exceptions in py3k. I propose the following changes:

- an "except" block now also becomes a block in ceval.c terms, that is, a
specific PyTryBlock is pushed at its beginning (please note that right now
SETUP_EXCEPT, despite its name, encloses the "try" block rather than any
"except" statement)
- this specific PyTryBlock - let's name it EXCEPT_HANDLER - is created
implicitly, not explicitly through an opcode; this is necessary because it must
be created *before* setting the current exception state to the caught exception,
waiting for an opcode to be executed would be too late
- before pushing this EXCEPT_HANDLER on the block stack, the current thread's
exception state (that is, before the exception is caught) is saved on the frame
stack (that is, the three objects representing the type, value and traceback
respectively)
- an EXCEPT_HANDLER block is unwinded explicitly with a dedicated POP_EXCEPT
opcode at the end of the exception handler; this opcode, not only unwinds the
block as POP_BLOCK does, but also pops and restores the exception state which
was saved on the stack before pushing the block
- an EXCEPT_HANDLER block, when it is unwinded implicitly because of a control
transfer (e.g. "return" or "continue" or "break" or "raise"), follows the same
treatment as in the POP_EXCEPT opcode: that is, in addition to unwinding the
block, it also pops and restores the previous exception state
- the current set_exc_info() / reset_exc_info() machinery is yanked, since it is
not useful anymore; this also probably removes three fields in the frame object,
because it does not need to contain the previous exception state anymore

I've not studied the "with" statement implementation. Chances are it should
also be adapted to follow the principles above. I may also be missing other
annoying "details" :-)

What do you think?

Regards

Antoine.


From musiccomposition at gmail.com  Mon May 26 14:35:24 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Mon, 26 May 2008 07:35:24 -0500
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <48397ECC.9070805@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
Message-ID: <1afaf6160805260535l1fd5d136vbdf2d24b1380e2be@mail.gmail.com>

On Sun, May 25, 2008 at 9:59 AM, Christian Heimes <lists at cheimes.de> wrote:
>
> I like to follow Guido's advice and change the code as following:
>
>  * replace PyBytes_ with PyByteArray_
>  * replace PyString with PyBytes_
>  * rename bytesobject.[ch] to bytearrayobject.[ch]
>  * rename stringobject.[ch] to bytesobject.[ch]
>  * add a new file stringobject.h which contains the aliases PyString_ ->
> PyBytes_

+1

Do you need any help?
>
> Christian


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From mal at egenix.com  Mon May 26 15:29:07 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 26 May 2008 15:29:07 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <48397ECC.9070805@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
Message-ID: <483ABB23.6050900@egenix.com>

On 2008-05-25 16:59, Christian Heimes wrote:
> Hello!
> 
> The first set of betas of Python 2.6 and 3.0 is fast apace. I like to
> grab the final chance and clean up the C API of 2.6 and 3.0. I know, I
> know, I brought up the topic two times in the past. But this time I mean
> it for real! :]
> 
> Last time Guido said:
> ---
> I think it can actually be simplified. I think maintaining binary
> compatibility between 2.6 and earlier versions is hopeless anyway, so
> we might as well just rename PyString to PyBytes in 2.6 and 3.0, and
> have an extra set of macros so that code using PyString needs to be
> recompiled but not otherwise touched. E.g.
> 
> typedef { ... } PyBytesObject;
> #define PyStringObject PyBytesObject
> 
> ... PyString_Type;
> #define PyBytes_Type PyString_Type
> 
> <etc>
> ---
> 
> I like to follow Guido's advice and change the code as following:
> 
>  * replace PyBytes_ with PyByteArray_
>  * replace PyString with PyBytes_
>  * rename bytesobject.[ch] to bytearrayobject.[ch]
>  * rename stringobject.[ch] to bytesobject.[ch]
>  * add a new file stringobject.h which contains the aliases PyString_ ->
> PyBytes_

Since this is major break in the Python C API, please make sure
that you bump the Python C API level used for module imports.

Most imports will fail anyway at the link stage, since PyString_* APIs
are probably the most used C APIs in Python extensions.

One detail, I'm worried about is the change of the type name, since
that is sometimes used in object serialization or proxy implementations.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 26 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            41 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From lists at cheimes.de  Mon May 26 15:43:57 2008
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 26 May 2008 15:43:57 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <1afaf6160805260535l1fd5d136vbdf2d24b1380e2be@mail.gmail.com>
References: <48397ECC.9070805@cheimes.de>
	<1afaf6160805260535l1fd5d136vbdf2d24b1380e2be@mail.gmail.com>
Message-ID: <483ABE9D.3070004@cheimes.de>

Benjamin Peterson schrieb:
> On Sun, May 25, 2008 at 9:59 AM, Christian Heimes <lists at cheimes.de> wrote:
>> I like to follow Guido's advice and change the code as following:
>>
>>  * replace PyBytes_ with PyByteArray_
>>  * replace PyString with PyBytes_
>>  * rename bytesobject.[ch] to bytearrayobject.[ch]
>>  * rename stringobject.[ch] to bytesobject.[ch]
>>  * add a new file stringobject.h which contains the aliases PyString_ ->
>> PyBytes_
> 
> +1
> 
> Do you need any help?

I've renamed the functions and modules. Can you help me with updating
the C API docs? In Python 2.6 the docs must still use PyString but you
can add a note that PyBytes_ works, too.

Christian

From lists at cheimes.de  Mon May 26 15:40:31 2008
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 26 May 2008 15:40:31 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483ABB23.6050900@egenix.com>
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
Message-ID: <483ABDCF.8000105@cheimes.de>

M.-A. Lemburg schrieb:
> Most imports will fail anyway at the link stage, since PyString_* APIs
> are probably the most used C APIs in Python extensions.

I think you have missed an important point. In Python 2.6 the names stay
the same for the linker. Although the functions are now called
PyBytes_Egg, they are redefined to PyString_Egg by a second header file.

In Python 2.6 the renaming of PyString are purely for consistence with
the new Python 3.0 names. The names for PyString stay the same for
external code like the library and extension modules.

PyBytes -> PyByteArray is a different story, though.

> One detail, I'm worried about is the change of the type name, since
> that is sometimes used in object serialization or proxy implementations.

The type names aren't changed, too They are still "str" and "bytearray"
in Python 2.6

(moved down)
> Since this is major break in the Python C API, please make sure
> that you bump the Python C API level used for module imports.

Do you still think it's necessary to bump up the C API version level?

Christian

From mal at egenix.com  Mon May 26 17:03:20 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Mon, 26 May 2008 17:03:20 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483ABDCF.8000105@cheimes.de>
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
	<483ABDCF.8000105@cheimes.de>
Message-ID: <483AD138.7000804@egenix.com>

On 2008-05-26 15:40, Christian Heimes wrote:
> M.-A. Lemburg schrieb:
>> Most imports will fail anyway at the link stage, since PyString_* APIs
>> are probably the most used C APIs in Python extensions.
> 
> I think you have missed an important point. In Python 2.6 the names stay
> the same for the linker. Although the functions are now called
> PyBytes_Egg, they are redefined to PyString_Egg by a second header file.
 >
> In Python 2.6 the renaming of PyString are purely for consistence with
> the new Python 3.0 names. The names for PyString stay the same for
> external code like the library and extension modules.

Isn't that an awefuly confusing approach ?

Wouldn't it be better to keep PyString APIs and definitions in
stringobject.c|h

and only add a new bytesobject.h header file that #defines the
PyBytes APIs in terms of PyString APIs ? That maintains
backwards compatibility and allows Python internals to use the
new API names.

With your approach, you've basically backported the confusing
notion in Py3k that str() maps PyUnicode, only that in Py2
str() will now map to PyBytes.

You'd have to add an aliase bytes -> str to the builtins to
at least reduce the confusion a bit.

However, that's bound to cause even more problems, since people
will start using bytes() instead of str() in Py2 applications
and as a result they won't run in older Python versions anymore.

The same problem applies to Py2 extensions writers that wish
to support older Python releases as well.

> PyBytes -> PyByteArray is a different story, though.

PyBytes was new in 2.6 anyway, so there's no breakage there.

>> One detail, I'm worried about is the change of the type name, since
>> that is sometimes used in object serialization or proxy implementations.
> 
> The type names aren't changed, too They are still "str" and "bytearray"
> in Python 2.6

Good.

> (moved down)
>> Since this is major break in the Python C API, please make sure
>> that you bump the Python C API level used for module imports.
> 
> Do you still think it's necessary to bump up the C API version level?

Yes, but please let's first discuss this some more. I don't think
that the timing was right.... you started this thread just yesterday
and the patches are already checked in.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 26 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            41 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From g.brandl at gmx.net  Mon May 26 17:12:30 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 26 May 2008 17:12:30 +0200
Subject: [Python-3000] http package: _FooCookieJar modules?
Message-ID: <g1ejqk$lpm$1@ger.gmane.org>

dbm and xmlrpc are done, now I'm at the http package, and have a
question:

Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar
modules separate from http.cookiejar? I'd rather merge them into
http.cookiejar and have two less strangely named modules.

cheers,
Georg


From fumanchu at aminus.org  Mon May 26 18:29:26 2008
From: fumanchu at aminus.org (Robert Brewer)
Date: Mon, 26 May 2008 09:29:26 -0700
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <loom.20080526T091655-961@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
Message-ID: <F1962646D3B64642B7C9A06068EE1E640366F8FA@ex10.hostedexchange.local>

Antoine Pitrou wrote:
> Trying to fix #2507 (Exception state lives too long in 3.0) has
> uncovered new issues with the bare "raise" statement when in used
> in exception block nesting situations (see #2833: __exit__
> silences the active exception). I say "uncovered" rather than
> "created" since, as Amaury points out in the latter bug entry,
> re-raising behaviour has always been a bit limited or non-obvious.
> 
> Witness the following code:
> 
>    try:
>       raise Exception("foo")
>    except Exception:
>       try: raise KeyError("caught")
>       except KeyError: pass
>       raise
> 
> With python 2.x and py3k pre-r62847, it would re-raise 
> KeyError("caught") (whereas the intuitive behaviour would
> be to re-raise Exception("foo")). With py3k post-r62847,
> it now raises a "RuntimeError: No active exception to
> reraise".
> 
> Note that in py3k at least, we can get the "correct" behaviour by
> writing instead:
> 
>    try:
>       raise Exception("foo")
>    except Exception as e:
>       try: raise KeyError("caught")
>       except KeyError: pass
>       raise e
> 
> The only slight annoyance being that the re-raising statement
> ("raise e") is added at the end of the original traceback.

I wouldn't call that either "incorrect" or "non-obvious".
It certainly hasn't been a burden in Python 2.x.

> There are other funny situations. Just try (with any Python version):
> 
> def except_yield():
>     try:
>         raise Exception("foo")
>     except:
>         yield 1
>         raise
> list(except_yield())
> 
> The problem with properly fixing the bare "raise" statement is that
> right now, the saved exception state is a member of the frame object.
> That is, there is no proper stacking of exception states when some
> lexically nested exception handlers are involved in the same frame.
> 
> Now perhaps it is time to think about fixing that problem, without
> losing the expected properties of exceptions in py3k. I propose
> the following changes:
> 
> - an "except" block now also becomes a block in ceval.c terms,
> that is, a specific PyTryBlock is pushed at its beginning (please
> note that right now SETUP_EXCEPT, despite its name, encloses the
> "try" block rather than any "except" statement)
> [snip lots more changes]

That seems like an awful lot of work and change just to trade the above
problem for a new one:

    try:
       raise Exception("foo")
    except Exception as e:
       try: raise KeyError("caught")
       except KeyError as x: pass
       raise x

In either case, it's easy enough to bind the exception to a name--easier
in 2.6/3k with the abolition of string exceptions (since "except
BaseException" now catches everything).


Robert Brewer
fumanchu at aminus.org


From solipsis at pitrou.net  Mon May 26 18:48:54 2008
From: solipsis at pitrou.net (Antoine)
Date: Mon, 26 May 2008 18:48:54 +0200 (CEST)
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E640366F8FA@ex10.hostedexchange.local>
References: <loom.20080526T091655-961@post.gmane.org>
	<F1962646D3B64642B7C9A06068EE1E640366F8FA@ex10.hostedexchange.local>
Message-ID: <47085.192.165.213.18.1211820534.squirrel@webmail.nerim.net>


Hi,

>> The only slight annoyance being that the re-raising statement
>> ("raise e") is added at the end of the original traceback.
>
> I wouldn't call that either "incorrect" or "non-obvious".

What are you talking about exactly? :)
The fact that in 2.x the last caught exception is re-raised even after the
end of the except block which caught it, rather than the exception caught
by the lexically enclosing block?

Anyway, in 3.x this behaviour will be impossible to mimick, since by
specification the exception state must disappear at the end of the except
block.

> That seems like an awful lot of work and change just to trade the above
> problem for a new one:
>
>     try:
>        raise Exception("foo")
>     except Exception as e:
>        try: raise KeyError("caught")
>        except KeyError as x: pass
>        raise x

The snippet above will not work under 3.x *by design* (the "x" variable
disappears at the end of the except block), there is even a test for it in
test_exceptions.py :-)

The proposal I made is meant to allow having proper exception cleanup
semantics as mandated by the py3k spec, and yet be able to using a bare
"raise" re-raising statement in non-trivial nested exception handler
situations.

Or of course we can just decide that bare "raise" is obsolete in 3.x and
must be replaced with a properly qualified raise statement.

Regards

Antoine.


From solipsis at pitrou.net  Mon May 26 18:58:15 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Mon, 26 May 2008 16:58:15 +0000 (UTC)
Subject: [Python-3000] Exception re-raising woes
References: <loom.20080526T091655-961@post.gmane.org>
	<F1962646D3B64642B7C9A06068EE1E640366F8FA@ex10.hostedexchange.local>
	<47085.192.165.213.18.1211820534.squirrel@webmail.nerim.net>
Message-ID: <loom.20080526T165634-225@post.gmane.org>

Antoine <solipsis <at> pitrou.net> writes:
> 
> The proposal I made is meant to allow having proper exception cleanup
> semantics as mandated by the py3k spec, and yet be able to using a bare
> "raise" re-raising statement in non-trivial nested exception handler
> situations.

I forgot to add that sys.exc_info() is probably impacted too.
Actually, anything which retrieves the current thread's exception state...

Regards

Antoine.


From brett at python.org  Mon May 26 19:45:06 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 26 May 2008 10:45:06 -0700
Subject: [Python-3000] http package: _FooCookieJar modules?
In-Reply-To: <g1ejqk$lpm$1@ger.gmane.org>
References: <g1ejqk$lpm$1@ger.gmane.org>
Message-ID: <bbaeab100805261045t1a4fb774w70920556469bdba5@mail.gmail.com>

On Mon, May 26, 2008 at 8:12 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> dbm and xmlrpc are done, now I'm at the http package, and have a
> question:
>
> Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar
> modules separate from http.cookiejar? I'd rather merge them into
> http.cookiejar and have two less strangely named modules.
>

They have leading underscores, so do what you will. If anyone directly
imports them that's their problem.

-Brett

From g.brandl at gmx.net  Mon May 26 20:01:22 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Mon, 26 May 2008 20:01:22 +0200
Subject: [Python-3000] http package: _FooCookieJar modules?
In-Reply-To: <bbaeab100805261045t1a4fb774w70920556469bdba5@mail.gmail.com>
References: <g1ejqk$lpm$1@ger.gmane.org>
	<bbaeab100805261045t1a4fb774w70920556469bdba5@mail.gmail.com>
Message-ID: <g1etn8$p5g$1@ger.gmane.org>

Brett Cannon schrieb:
> On Mon, May 26, 2008 at 8:12 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>> dbm and xmlrpc are done, now I'm at the http package, and have a
>> question:
>>
>> Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar
>> modules separate from http.cookiejar? I'd rather merge them into
>> http.cookiejar and have two less strangely named modules.
>>
> 
> They have leading underscores, so do what you will. If anyone directly
> imports them that's their problem.

Done.

Georg


From brett at python.org  Mon May 26 21:08:55 2008
From: brett at python.org (Brett Cannon)
Date: Mon, 26 May 2008 12:08:55 -0700
Subject: [Python-3000] http package: _FooCookieJar modules?
In-Reply-To: <g1etn8$p5g$1@ger.gmane.org>
References: <g1ejqk$lpm$1@ger.gmane.org>
	<bbaeab100805261045t1a4fb774w70920556469bdba5@mail.gmail.com>
	<g1etn8$p5g$1@ger.gmane.org>
Message-ID: <bbaeab100805261208m56aef0d7yd0e432e80eb580bc@mail.gmail.com>

On Mon, May 26, 2008 at 11:01 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> Brett Cannon schrieb:
>>
>> On Mon, May 26, 2008 at 8:12 AM, Georg Brandl <g.brandl at gmx.net> wrote:
>>>
>>> dbm and xmlrpc are done, now I'm at the http package, and have a
>>> question:
>>>
>>> Is there any reason to keep the _MozillaCookieJar and _LWPCookieJar
>>> modules separate from http.cookiejar? I'd rather merge them into
>>> http.cookiejar and have two less strangely named modules.
>>>
>>
>> They have leading underscores, so do what you will. If anyone directly
>> imports them that's their problem.
>
> Done.

Great!

Wow, PEP 3108 might actually get finished before the beta!

-Brett

From lists at cheimes.de  Mon May 26 23:34:58 2008
From: lists at cheimes.de (Christian Heimes)
Date: Mon, 26 May 2008 23:34:58 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483AD138.7000804@egenix.com>
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
	<483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com>
Message-ID: <483B2D02.8040400@cheimes.de>

M.-A. Lemburg schrieb:
> Isn't that an awefuly confusing approach ?
> 
> Wouldn't it be better to keep PyString APIs and definitions in
> stringobject.c|h
> 
> and only add a new bytesobject.h header file that #defines the
> PyBytes APIs in terms of PyString APIs ? That maintains
> backwards compatibility and allows Python internals to use the
> new API names.
> 
> With your approach, you've basically backported the confusing
> notion in Py3k that str() maps PyUnicode, only that in Py2
> str() will now map to PyBytes.

The last time I brought up the topic, I had a lengthy discussion with
Guido. At first I wanted to rename the API in Python 3.0 only. Guido
argued that it's going to cause too much merge conflicts. He then
suggested the approach I implemented today.

I find the approach less confusing than your suggestion and my initial
idea. The internal API names are consistent for Python 2.6 and 3.0. The
byte string C API is prefixed PyBytes and the unicode C API is prefixed
PyUnicode. A core developer has just to remember that 'str' is a byte
string in 2.x but an unicode object in 3.0.

Extension developers don't have to worry at all. The ABI and external
API is mostly the same and still exposes the 'str' functions as PyString.

> You'd have to add an aliase bytes -> str to the builtins to
> at least reduce the confusion a bit.

Python 2.6 already has an alias bytes -> str

> Yes, but please let's first discuss this some more. I don't think
> that the timing was right.... you started this thread just yesterday
> and the patches are already checked in.

I'm sorry if I was too hasty for you. I got +1 from a couple of
developers and it's basically Guido's suggestion.

Christian

From jimjjewett at gmail.com  Tue May 27 00:53:41 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 26 May 2008 18:53:41 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <20080524171814.GA4026@phd.pp.ru>
References: <ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
Message-ID: <fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>

On 5/24/08, Oleg Broytmann <phd at phd.pp.ru> wrote:
> On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote:
>  > if I want pretty, I'll use print (or pprint).

>    str(container_of_strings) uses repr(), so you loose prettiness on either
>  print or '%s' % container_of_strings.

This is not a problem with repr; it is a bug with str.

I certainly support a flag for repr meaning "This was really str; repr
got called because the container doesn't have str, but go back to str
for the contents."  (Alternatively, write an explicit repr that does
that, add it to the builtin types, and make it available for easy use
with extensions.)

> Exceptions use repr() for file names,
>  e.g., which is very inconvenient, IMHO.

I'm not sure I fully understand this problem, but I would expect the
right solution to be a change to either Exception.__str__ or the way
filename-related exceptions are initialized.  Changing all of repr is
again overkill.

-jJ

From phd at phd.pp.ru  Tue May 27 01:03:30 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 27 May 2008 03:03:30 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
Message-ID: <20080526230330.GB8849@phd.pp.ru>

On Mon, May 26, 2008 at 06:53:41PM -0400, Jim Jewett wrote:
> On 5/24/08, Oleg Broytmann <phd at phd.pp.ru> wrote:
> > On Sat, May 24, 2008 at 12:53:08PM -0400, Jim Jewett wrote:
> >  > if I want pretty, I'll use print (or pprint).
> 
> >    str(container_of_strings) uses repr(), so you loose prettiness on either
> >  print or '%s' % container_of_strings.
> 
> This is not a problem with repr; it is a bug with str.

   str(container) tries to call container.__str__ (which is absent) and
then container.__repr__. Is "str or repr" a bug?

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From phd at phd.pp.ru  Tue May 27 01:24:02 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 27 May 2008 03:24:02 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
Message-ID: <20080526232402.GC8849@phd.pp.ru>

On Mon, May 26, 2008 at 06:53:41PM -0400, Jim Jewett wrote:
> I certainly support a flag for repr meaning "This was really str; repr
> got called because the container doesn't have str, but go back to str
> for the contents."  (Alternatively, write an explicit repr that does
> that, add it to the builtin types, and make it available for easy use
> with extensions.)
> 
> > Exceptions use repr() for file names,
> >  e.g., which is very inconvenient, IMHO.
> 
> I'm not sure I fully understand this problem, but I would expect the
> right solution to be a change to either Exception.__str__ or the way
> filename-related exceptions are initialized.  Changing all of repr is
> again overkill.

   There are two different and unrelated problems. One is that
str(container) calls repr() on items. This probably could be fixed with
a flag to repr() so it remembers it was called from str(). This has nothing
with hex-encoding strings - calling str() on items would be a win in any
case, especially for items that implements both __str__ and __repr__
methods.
   The other problem is that repr(string) returns it hex-encoded. The second
problem is hard to fix without changing repr itself, because repr() is used
in many places (exceptions was only one example). On the other hand those
who want the old (current) behaviour can do
repr(obj).encode("ascii", errors="backslashreplace").

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From jimjjewett at gmail.com  Tue May 27 01:26:56 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 26 May 2008 19:26:56 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>

Summary:

The only reason for this change is that __repr__ gets used when
__str__ *should* be used instead.

Fix that bug instead of making repr less predictable.

On 5/25/08, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Jim Jewett writes:

>   > I'm more worried that it might look like English, yet be subtly
>   > (and importantly) different.

> Let me remind you that I advocated that position, and (1) Martin
> shot me down hard, and (2) Guido indicated that it is a point,
> but he now seems happy enough not to worry about it.

I will agree that this is similar to the issue of non-ascii
identifiers.  If you can always trust everything on your system
completely, then it doesn't matter whether or not you can even read
the code.  If you might have to at least review things, then
confusability is an issue.

The question is where to draw the line.

I see print (and therefore str) as being intended for people, so they
should clearly use as much of unicode as available.

Identifiers are not usually part of the UI, so the case isn't as
strong (but i agree that it is now settled).

repr is not for normal UI; it is in explicit contrast to str.  I
therefore believe it should default to the safest possible
representation.

>... in view of the wide variety of cases where it seems to be
>  used for something other than diagnosing normally invisible
> features of output.

These are bugs.  I haven't yet seen a single case where it *should*
have been using repr instead of str.  Unfortunately, str itself
resorts to repr in some cases, and -- buggily -- then stays in repr
mode as it recurses down.

The right answer is not to make repr less predictable; it is to make
those str representations better -- if only by having them go back to
str(x) for the containers' contents.

>   > I just want it to be very easy to say "on my system, repr is ASCII".

> That is in all proposals.

Then I sometimes missed it.  And I'll note that it didn't happen for
identifiers.

>   > I would prefer that ASCII also be the default, so that people who want
>   > more characters opt in to receive them,

>... given the extent to
>  which repr is used to produce output meaningful to end-users
>  (vs. diagnostics for application and/or Python maintainers).

Again -- *why* is repr used instead of str?  As nearly as I can tell,
it is because of bugs (admittedly, often in builtin __str__
functions); making repr less predictable is a workaround rather than a
solution.

-jJ

From phd at phd.pp.ru  Tue May 27 01:34:32 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 27 May 2008 03:34:32 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
Message-ID: <20080526233432.GD8849@phd.pp.ru>

On Mon, May 26, 2008 at 07:26:56PM -0400, Jim Jewett wrote:
> Summary:
> 
> The only reason for this change is that __repr__ gets used when
> __str__ *should* be used instead.

   No, it is not the only reason. The other reason is that repr() is used in
many different places where we want readable output.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From rhamph at gmail.com  Tue May 27 02:00:02 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 26 May 2008 18:00:02 -0600
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <20080526232402.GC8849@phd.pp.ru>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
	<20080526232402.GC8849@phd.pp.ru>
Message-ID: <aac2c7cb0805261700k78bceb80g9ddec2465721abf7@mail.gmail.com>

On Mon, May 26, 2008 at 5:24 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> On Mon, May 26, 2008 at 06:53:41PM -0400, Jim Jewett wrote:
>> I certainly support a flag for repr meaning "This was really str; repr
>> got called because the container doesn't have str, but go back to str
>> for the contents."  (Alternatively, write an explicit repr that does
>> that, add it to the builtin types, and make it available for easy use
>> with extensions.)
>>
>> > Exceptions use repr() for file names,
>> >  e.g., which is very inconvenient, IMHO.
>>
>> I'm not sure I fully understand this problem, but I would expect the
>> right solution to be a change to either Exception.__str__ or the way
>> filename-related exceptions are initialized.  Changing all of repr is
>> again overkill.
>
>   There are two different and unrelated problems. One is that
> str(container) calls repr() on items. This probably could be fixed with
> a flag to repr() so it remembers it was called from str(). This has nothing
> with hex-encoding strings - calling str() on items would be a win in any
> case, especially for items that implements both __str__ and __repr__
> methods.

There's a reason for that convention.  Would you prefer str(['1', '2',
'3']) return '[1, 2, 3]'?

Personally, I'm happy with Guido's suggestion of stderr and
interactive to backslashreplace and stdout to strict.  It's a dramatic
change, but I think I can get used to it.

-- 
Adam Olsen, aka Rhamphoryncus

From jimjjewett at gmail.com  Tue May 27 03:06:55 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 26 May 2008 21:06:55 -0400
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
Message-ID: <fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>

On 5/24/08, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
>  Specification
>  =============

It might help to call out which parts are changes.  If I understand
correctly, the only changes (as opposed to additions) are for
characters which are for characters which are (all three of)

(a)  outside of ASCII
(b)  not broken (that is, not half of a surrogate pair half)
(c)  not in the new excluded set.

>   * Characters defined in the Unicode character database as "Separator"
>     (Zl, Zp, Zs) other than ASCII space(0x20).

Please put in a note that  Zl and Zp refer only to two specific
unicode characters, not to what most people think of as line
separators or paragraph markers.

>   * Backslash-escape quote characters(apostrophe, ') and add quote
>     character at the beginning and the end.

Do you just mean the two ASCII quotation marks  that python uses?

As written, I wondered whether it would include backquote or guillemet.

>  - Add ``'%a'`` string format operator. ``'%a'`` converts any python
>   object to string using ``repr()`` and then hex-escape all non-ASCII
>   characters. ``'%a'`` operator generates same string as ``'%r'`` in
>   Python 2.

Then why not keep the old %r, and add a new one for the unicode repr?

Is it again because of the bug where str([..., mystr, ...])   ends up
doing repr on mystr?

>  - Add ``ascii()`` builtin function. ``ascii()`` converts any python
>   object to string using ``repr()`` and then hex-escape all non-ASCII
>   characters. ``ascii()`` generates same string as ``repr()`` in Python 2.

The problem isn't that I want to be able to write code that acts the
old way; the problem is that I want to ensure all code running on my
system acts the old way.

Adding an ascii() function doesn't help.

Keeping repr and adding full_repr would work (because I could look for
the new name).

Keeping repr and fixing the way it recurses when used as a str
fallback would be even better.

>   Strings to be printed for debugging are not only contained by lists or
>   dicts, but also in many other types of object. File objects contain a
>   file name in Unicode, exception objects contain a message in Unicode,
>   etc. These strings should be printed in readable form when repr()ed.
>   It is unlikely to be possible to implement a tool to print all
>   possible object types.

You could go a long way (particularly in Py3k, where everything
inherits from object) by changing the builtin containers, and changing
object.__str__ to try

     "<%s: %s>" % (type(v), iter(v))

before falling back to repr.  (You may wish something that looks for
mappings and sequences instead of any iterables.  You may wish to
change the exact look of the repr -- the point is just to tell the
contained objects to try str.)

>  - Make the encoding used by ``unicode_repr()`` adjustable, and make
>   current ``repr()`` as default.

>   With adjustable ``repr()``, result of ``repr()`` is unpredictable and
>   would make impossible to write correct code involving ``repr()``.

No more so than 3138.  The setting of repr is predictable on a given
system.  (Even if you make it a changeable during a single run, it is
predictable by checking first.)  Across systems, the 3138 proposal is
already unpredictable, because you don't know which systems will apply
backslash-replace on which characters (and on which runs).

-jJ

From greg.ewing at canterbury.ac.nz  Tue May 27 04:04:51 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 27 May 2008 14:04:51 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
References: <ca471dc20805221418o60b06e0gc3997383183ff1c5@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
Message-ID: <483B6C43.1060401@canterbury.ac.nz>

Jim Jewett wrote:

> I certainly support a flag for repr meaning "This was really str; repr
> got called because the container doesn't have str, but go back to str
> for the contents."

Doing this properly would require changing the signature
of repr() *everywhere*, not just for strings, because
the flag needs to be propagated recursively to any nested
object that could have strings in it.

> Alternatively, write an explicit repr that does
> that,

Again, every container type would need to know about and handle
this new form of repr().

-- 
Greg

From jimjjewett at gmail.com  Tue May 27 04:13:10 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 26 May 2008 22:13:10 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <aac2c7cb0805261700k78bceb80g9ddec2465721abf7@mail.gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
	<20080526232402.GC8849@phd.pp.ru>
	<aac2c7cb0805261700k78bceb80g9ddec2465721abf7@mail.gmail.com>
Message-ID: <fb6fbf560805261913neb157dya75f85494953ae95@mail.gmail.com>

On 5/26/08, Adam Olsen <rhamph at gmail.com> wrote:
> On Mon, May 26, 2008 at 5:24 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:

>  >   There are two different and unrelated problems. One is that
>  > str(container) calls repr() on items. This probably could be fixed with
>  > a flag to repr() so it remembers it was called from str(). This has nothing
>  > with hex-encoding strings - calling str() on items would be a win in any
>  > case, especially for items that implements both __str__ and __repr__
>  > methods.

> There's a reason for that convention.  Would you prefer str(['1', '2',
>  '3']) return '[1, 2, 3]'?

I don't think anyone is arguing about how to display

    >>> "%" % string

The problem is classes where str(x) != repr(x), and how they get
messed up when a container holding (one of their) instances is
printed.

>>> class A:
	def __str__(self):  return "an A"
>>> a=A()

>>> print a    # this is fine.
an A
>>> str(a)      # this is OK, you have asked for "%s" % a
'an A'
>>> repr(a)   # this is OK, you wanted repr explicitly.
'<__main__.A instance at 0x012DDAF8>'

>>> print ([a])   # this stinks ...
[<__main__.A instance at 0x012DDAF8>]

It would be much better as:

>>> print ([a])   # after fixing the recursion bug
['an a']


Whereas you are asking about the (perhaps also acceptable):

>>> # after fixing the recursion bug,
>>> print ([a])   # and somehow not even applying str
[an a]

-jJ

From greg.ewing at canterbury.ac.nz  Tue May 27 04:12:51 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 27 May 2008 14:12:51 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
Message-ID: <483B6E23.3050805@canterbury.ac.nz>

Jim Jewett wrote:

> repr is not for normal UI;

Except that there seem to be places where it *is* used
for normal UI, e.g. putting filenames into error messages.

-- 
Greg

From ncoghlan at gmail.com  Tue May 27 04:32:24 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 27 May 2008 12:32:24 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261913neb157dya75f85494953ae95@mail.gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>	<20080524171814.GA4026@phd.pp.ru>	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>	<20080526232402.GC8849@phd.pp.ru>	<aac2c7cb0805261700k78bceb80g9ddec2465721abf7@mail.gmail.com>
	<fb6fbf560805261913neb157dya75f85494953ae95@mail.gmail.com>
Message-ID: <483B72B8.30906@gmail.com>

Jim Jewett wrote:
> On 5/26/08, Adam Olsen <rhamph at gmail.com> wrote:
>> On Mon, May 26, 2008 at 5:24 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> 
>>  >   There are two different and unrelated problems. One is that
>>  > str(container) calls repr() on items. This probably could be fixed with
>>  > a flag to repr() so it remembers it was called from str(). This has nothing
>>  > with hex-encoding strings - calling str() on items would be a win in any
>>  > case, especially for items that implements both __str__ and __repr__
>>  > methods.
> 
>> There's a reason for that convention.  Would you prefer str(['1', '2',
>>  '3']) return '[1, 2, 3]'?
> 
> I don't think anyone is arguing about how to display
> 
>     >>> "%" % string
> 
> The problem is classes where str(x) != repr(x), and how they get
> messed up when a container holding (one of their) instances is
> printed.

This is NOT a bug, since str([1, 2, 3]) and str(list("123")) SHOULD 
produce results that look different. Calling str() internally to display 
the contents of containers is a broken idea. The ambiguity that 
recursive calls to str() would introduce would make any concerns about 
potential confusion between different Unicode glyphs seem utterly 
inconsequential.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Tue May 27 04:48:08 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 27 May 2008 12:48:08 +1000
Subject: [Python-3000] UPDATED: PEP 3138- String representation
 in	Python 3000
In-Reply-To: <fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
Message-ID: <483B7668.1090800@gmail.com>

Jim Jewett wrote:
> Is it again because of the bug where str([..., mystr, ...])   ends up
> doing repr on mystr?

Jim, could you please stop describing this behaviour as a bug? It is a 
perfectly legitimate and desirable approach that ensures lists of 
different items look different when displayed.

Or are actually stating that you *want* str([1, 2, 3]) and 
str(list("123")) to produce the same output?

>>  - Add ``ascii()`` builtin function. ``ascii()`` converts any python
>>   object to string using ``repr()`` and then hex-escape all non-ASCII
>>   characters. ``ascii()`` generates same string as ``repr()`` in Python 2.
> 
> The problem isn't that I want to be able to write code that acts the
> old way; the problem is that I want to ensure all code running on my
> system acts the old way.

This is for Py3k - you'll be lucky if your old code runs at all, let 
alone in the same way.

> Adding an ascii() function doesn't help.
> 
> Keeping repr and adding full_repr would work (because I could look for
> the new name).

Py3k. The default option should do the right thing (and in that 
day-and-age, that means permitting Unicode, rather than restricting 
object representations to the anglo-centric ASCII subset). The ascii() 
function would just be a convenience for those cases where the 
programmer deliberately wants to be anglo-centric.

> Keeping repr and fixing the way it recurses when used as a str
> fallback would be even better.

No it wouldn't - the ambiguity introduced by doing so would dwarf 
anything we might introduce by permitting arbitrary Unicode characters 
in repr() output.

> No more so than 3138.  The setting of repr is predictable on a given
> system.  (Even if you make it a changeable during a single run, it is
> predictable by checking first.)  Across systems, the 3138 proposal is
> already unpredictable, because you don't know which systems will apply
> backslash-replace on which characters (and on which runs).

If you're worried about doctests, those should be using StringIO 
objects, so nothing will ever need to be backslash replaced (since it 
will be Unicode all the way).

In terms of actual IO for display to a user, why do you care if 
something gets backslash replaced or not? The characters which are 
replaced will only be those which the user's terminal can't display anyway.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From stephen at xemacs.org  Tue May 27 05:02:45 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Tue, 27 May 2008 12:02:45 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
Message-ID: <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>

Jim Jewett writes:

 > The only reason for this change is that __repr__ gets used when
 > __str__ *should* be used instead.

That's not what the advocates say.

Now, repr() is supposed to return something that is acceptable to eval
(but doesn't always, especially for recursive objects), while str() is
supposed to be more "user-friendly" (but can be horrible if you need
to see precisely what the contents are or on an output device that's
not prepared for it).

As far as I can tell, which should be used is a "beauty in the eye of
the beholder" issue, and in the case of repr() Spanish and Chinese
users are going to feel more or less differently from Americans about
which characters should be escaped.

 > repr is not for normal UI; it is in explicit contrast to str.  I
 > therefore believe it should default to the safest possible
 > representation.

Well, in `String Conversions', the manual says """In particular,
converting a string adds quotes around it and converts 
"funny" characters to escape sequences that are safe to print."""

Now, I agree with you about what's "safe".  However, in a text-
processing application in a Japanese environment, that's hardly
useful, and our Japanese programmer can argue that in his environment,
printing all of Unicode *is* safe.  Furthermore, most people run in
environments where printing Unicode is safe.

 >>> I just want it to be very easy to say "on my system, repr is ASCII".
 > 
 >> That is in all proposals.
 > 
 > Then I sometimes missed it.

I should say, "that was in Guido's desiderata, so I assume anything
still on the table has it".  Viz:

    2. If you don't want any non-ASCII printed to a file, set the file's
    encoding to ASCII and the error handler to backslashescape.
    (In <ca471dc20805221055j52594fd2offb7fa3fcf936629 at mail.gmail.com>)

If that's not easy enough for you (I sympathize!), then you need to
get Guido's ear.

 > And I'll note that it didn't happen for identifiers.

That's on input, which is very much a different question.

 > Again -- *why* is repr used instead of str?

I don't use it myself other than as a way of diagnosing bugs in
programs I write or maintain; in personal practice, I'm in your camp.
But my understanding is that there is often an intermediate level,
such as a website admin, who needs *some* of the precision of repr()
such as escaped representation of whitespace, but also needs to be
able read most of the output.  It so happens that repr() works as
designed for ASCII and acceptably so for ISO Latin, precisely because
it *was* designed for ASCII!  It sucks for non-Western-European
scripts, though, including the ISO 8859 scripts for Cyrillic, Greek,
Arabic, and Hebrew.

My understanding is that there are more use-cases than there are
stringifying functions and methods.  Something's got to give.


From rhamph at gmail.com  Tue May 27 06:52:17 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 26 May 2008 22:52:17 -0600
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805261913neb157dya75f85494953ae95@mail.gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
	<20080526232402.GC8849@phd.pp.ru>
	<aac2c7cb0805261700k78bceb80g9ddec2465721abf7@mail.gmail.com>
	<fb6fbf560805261913neb157dya75f85494953ae95@mail.gmail.com>
Message-ID: <aac2c7cb0805262152p4b0c5919ra2022c9ddded304f@mail.gmail.com>

On Mon, May 26, 2008 at 8:13 PM, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 5/26/08, Adam Olsen <rhamph at gmail.com> wrote:
>> There's a reason for that convention.  Would you prefer str(['1', '2',
>>  '3']) return '[1, 2, 3]'?
>
> I don't think anyone is arguing about how to display
>
>    >>> "%" % string
>
> The problem is classes where str(x) != repr(x), and how they get
> messed up when a container holding (one of their) instances is
> printed.
>
>>>> class A:
>        def __str__(self):  return "an A"
>>>> a=A()
>
>>>> print a    # this is fine.
> an A
>>>> str(a)      # this is OK, you have asked for "%s" % a
> 'an A'
>>>> repr(a)   # this is OK, you wanted repr explicitly.
> '<__main__.A instance at 0x012DDAF8>'
>
>>>> print ([a])   # this stinks ...
> [<__main__.A instance at 0x012DDAF8>]
>
> It would be much better as:
>
>>>> print ([a])   # after fixing the recursion bug
> ['an a']
>
>
>
> Whereas you are asking about the (perhaps also acceptable):
>
>>>> # after fixing the recursion bug,
>>>> print ([a])   # and somehow not even applying str
> [an a]

Hmm, I see where the confusion is.  Containers only define __repr__,
so although you think it's the list.__str__ that's mistakenly using
repr(), it's str(list) itself that's calling repr(list).

So the question to ask is whether we can define a useful __str__ for
containers.  str(['an a']) -> '[an a]' is not too bad, but
str(['hello, world']) -> '[hello, world]' is ambiguous.  It crosses
the line into garbage.

We could probably define a third variant (beside __str__ and __repr__)
which'd be "pretty but unambiguous", but if it's just for escaping
unicode you should use PEP 3138's ascii_repr(), and if it's for more
it should be a separate discussion on python-list/python-ideas instead
of here.

Along the way I've found a new definition of repr: unambiguous when
used in a container's repr.

-- 
Adam Olsen, aka Rhamphoryncus

From phd at phd.pp.ru  Tue May 27 08:29:47 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 27 May 2008 10:29:47 +0400
Subject: [Python-3000] UPDATED: PEP 3138- String representation
	in	Python 3000
In-Reply-To: <483B7668.1090800@gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com>
Message-ID: <20080527062947.GA14808@phd.pp.ru>

On Tue, May 27, 2008 at 12:48:08PM +1000, Nick Coghlan wrote:
> Jim Jewett wrote:
> >Is it again because of the bug where str([..., mystr, ...])   ends up
> >doing repr on mystr?
> 
> Jim, could you please stop describing this behaviour as a bug?

   I am with  Jim on this part (but only on this). I'd like this

class Test:
    def __str__(self):
        return "STR"

    def __repr__(self):
        return "REPR"

test = Test()
print test
print repr(test)
print str(test)
print [test]
print str([test])
print repr([test])

   to print

STR
REPR
STR
[STR]
[STR]
[REPR]

   but the code actually prints

STR
REPR
STR
[REPR]
[REPR]
[REPR]

   str(container) not calling str() on items is at least a strange and
unexpected behaviour, if not a bug.
   Unfortunately, it is not easy to fix, and missing quotes on string items
is a loose (albeit a minor one for me), so I accepted the compromise - to
fix repr() and forget about str(container).

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From ncoghlan at gmail.com  Tue May 27 10:13:39 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 27 May 2008 18:13:39 +1000
Subject: [Python-3000] UPDATED: PEP 3138- String
 representation	in	Python 3000
In-Reply-To: <20080527062947.GA14808@phd.pp.ru>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>	<483B7668.1090800@gmail.com>
	<20080527062947.GA14808@phd.pp.ru>
Message-ID: <483BC2B3.6040308@gmail.com>

Oleg Broytmann wrote:
>    str(container) not calling str() on items is at least a strange and
> unexpected behaviour, if not a bug.

I have no problem at all with people calling this behaviour surprising 
and unexpected, but I'm not happy with them calling it a bug without 
being challenged, since there are very good reasons for it working the 
way it does.

str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce 
distinctive output: calling repr() on container contents achieves this, 
calling str() does not.

Strings are a good example of this ambiguity problem, but there are 
others, such as Decimal objects which can be indistinguishable from 
normal floats and integers when printed, but definitely aren't 
interchangeable with them:

 >>> x = [1, 2, 3, 1.0, 2.0, 3.0]
 >>> y = map(Decimal, [1, 2, 3, '1.0', '2.0', '3.0'])
 >>> x
[1, 2, 3, 1.0, 2.0, 3.0]
 >>> y
[Decimal("1"), Decimal("2"), Decimal("3"), Decimal("1.0"), 
Decimal("2.0"), Decimal("3.0")]
 >>> x == y
False

The reason for the inequality is fairly obvious given a repr() based 
output for the containers (1.0 != Decimal('1.0') by design), but how big 
would the potential for confusion be if str() on containers invoked 
str() on their contents:

 >>> print '[%s]' % ', '.join(map(str, x))
[1, 2, 3, 1.0, 2.0, 3.0]
 >>> print '[%s]' % ', '.join(map(str, y))
[1, 2, 3, 1.0, 2.0, 3.0]

While it could be argued that if you want unambiguous output you should 
be invoking repr() on the container instead of str(), I'm still seeing 
many more downsides than upsides to the idea of making str() on the 
builtin containers display their contents with str() instead of repr().

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From phd at phd.pp.ru  Tue May 27 10:36:13 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 27 May 2008 12:36:13 +0400
Subject: [Python-3000] str(container) calls repr() (was: PEP 3138)
In-Reply-To: <483BC2B3.6040308@gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com>
Message-ID: <20080527083612.GA15216@phd.pp.ru>

On Tue, May 27, 2008 at 06:13:39PM +1000, Nick Coghlan wrote:
> str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce 
> distinctive output: calling repr() on container contents achieves this, 
> calling str() does not.

   String representation is a special case and *the only* special case, and
must be handled as a special case. I don't like this special case to be used
as a model for all other types. No other type allows usage like list("123").

> While it could be argued that if you want unambiguous output you should 
> be invoking repr() on the container instead of str(), I'm still seeing 
> many more downsides than upsides to the idea of making str() on the 
> builtin containers display their contents with str() instead of repr().

   The decision should be upon the user. In an ideal world str(container)
calls str() on items, and repr(container) calls repr() on items, so the
user can choose what [s]he wants. Currently user is just stuck with repr().

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From ncoghlan at gmail.com  Tue May 27 11:28:42 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 27 May 2008 19:28:42 +1000
Subject: [Python-3000] str(container) calls repr()
In-Reply-To: <20080527083612.GA15216@phd.pp.ru>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>	<483B7668.1090800@gmail.com>
	<20080527062947.GA14808@phd.pp.ru>	<483BC2B3.6040308@gmail.com>
	<20080527083612.GA15216@phd.pp.ru>
Message-ID: <483BD44A.3070902@gmail.com>

Oleg Broytmann wrote:
> On Tue, May 27, 2008 at 06:13:39PM +1000, Nick Coghlan wrote:
>> str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce 
>> distinctive output: calling repr() on container contents achieves this, 
>> calling str() does not.
> 
>    String representation is a special case and *the only* special case, and
> must be handled as a special case.

The problem arises whenever you have two different objects which can 
produce the same answer for str(), but different answers for repr(). 
Strings are the most obvious case, since str(obj) and str(repr(obj)) 
will always produce the same answer for any object which doesn't have 
separate __str__ and __repr__ implementations, but they aren't the only 
case (as I endeavoured to show with the Decimal example).


> I don't like this special case to be used
> as a model for all other types. No other type allows usage like list("123").

That's just the way I happened to write it. You can enter it as 
str(list(["1", "2", "3"])) if you prefer.

>    The decision should be upon the user. In an ideal world str(container)
> calls str() on items, and repr(container) calls repr() on items, so the
> user can choose what [s]he wants. Currently user is just stuck with repr().

That's hardly the case - developers are quite free to iterate over the 
container invoking str() on each of the sub-items and building the 
output that way.

All I'm really asking for here is for people to identify the use cases 
that justify introducing such a potential for ambiguity into the 
container implementations.

That has been done to my satisfaction for PEP 3138 (which is why I've 
switched from being an opponent of allowing repr() to return arbitrary 
Unicode characters to being a supported of the PEP), but I've yet to see 
*any* specific use cases for containers invoking str() that wouldn't be 
better addressed with an application or library specific display loop.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From phd at phd.pp.ru  Tue May 27 11:57:09 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Tue, 27 May 2008 13:57:09 +0400
Subject: [Python-3000] str(container) calls repr()
In-Reply-To: <483BD44A.3070902@gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com> <20080527083612.GA15216@phd.pp.ru>
	<483BD44A.3070902@gmail.com>
Message-ID: <20080527095709.GC15216@phd.pp.ru>

On Tue, May 27, 2008 at 07:28:42PM +1000, Nick Coghlan wrote:
> The problem arises whenever you have two different objects which can 
> produce the same answer for str(), but different answers for repr(). 

   Aside strings itself, the only example of such objects I can imagine is
numbers (ints, floats and decimals). str(12) is the same as str('12') and
the same as str(Decimal('1')), but that's all.

> All I'm really asking for here is for people to identify the use cases 
> that justify introducing such a potential for ambiguity into the 
> container implementations.

   Why do you afraid of ambiguity so much? str() is supposed to produce
pretty output, not necessary non-ambiguous, tight? And if a user wants
non-ambiguity [s]he will use repr().

> I've yet to see 
> *any* specific use cases for containers invoking str() that wouldn't be 
> better addressed with an application or library specific display loop.

   Unfortunately, every library has to have its own specific loop, because
universal container traversing is impossible.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From mal at egenix.com  Tue May 27 12:10:25 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 27 May 2008 12:10:25 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483B2D02.8040400@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>
	<483AD138.7000804@egenix.com> <483B2D02.8040400@cheimes.de>
Message-ID: <483BDE11.509@egenix.com>

On 2008-05-26 23:34, Christian Heimes wrote:
> M.-A. Lemburg schrieb:
>> Isn't that an awefuly confusing approach ?
>>
>> Wouldn't it be better to keep PyString APIs and definitions in
>> stringobject.c|h
>>
>> and only add a new bytesobject.h header file that #defines the
>> PyBytes APIs in terms of PyString APIs ? That maintains
>> backwards compatibility and allows Python internals to use the
>> new API names.
>>
>> With your approach, you've basically backported the confusing
>> notion in Py3k that str() maps PyUnicode, only that in Py2
>> str() will now map to PyBytes.
> 
> The last time I brought up the topic, I had a lengthy discussion with
> Guido. At first I wanted to rename the API in Python 3.0 only. Guido
> argued that it's going to cause too much merge conflicts. He then
> suggested the approach I implemented today.

That's the same argument that came up in the module renaming
discussion.

I have a feeling that we should be looking for better merge
tools, rather than implement code changes that cause more trouble
than do good, just because our existing tools aren't smart
enough.

Wouldn't it be possible to have a 2to3.py converter
take the 2.x code (including the C code), convert it and then
apply any changes to the 3.x branch ?

This wouldn't be merging in the classical sense, it would be
automated forward porting.

> I find the approach less confusing than your suggestion and my initial
> idea.

I disagree on that.

Renaming old APIs to use the new names by adding a header file with
#define <oldname> <newname> is standard practice.

Renaming the old APIs in the source code and undoing the renaming
with a header file is not.

> The internal API names are consistent for Python 2.6 and 3.0. The
> byte string C API is prefixed PyBytes and the unicode C API is prefixed
> PyUnicode. A core developer has just to remember that 'str' is a byte
> string in 2.x but an unicode object in 3.0.

So you've solved part of the problem for 3.x by moving the naming mixup
back to 2.x.

> Extension developers don't have to worry at all. The ABI and external
> API is mostly the same and still exposes the 'str' functions as PyString.

Well, yes, but only due to a preprocessor hack that turns the
names used in bytesobject.c back into names you'd normally look
for in stringobject.c.

And all this, just because Subversion can't handle merging of
symbol renaming.

>> You'd have to add an aliase bytes -> str to the builtins to
>> at least reduce the confusion a bit.
> 
> Python 2.6 already has an alias bytes -> str
> 
>> Yes, but please let's first discuss this some more. I don't think
>> that the timing was right.... you started this thread just yesterday
>> and the patches are already checked in.
> 
> I'm sorry if I was too hasty for you. I got +1 from a couple of
> developers and it's basically Guido's suggestion.

Please discuss any changes of the 2.x code base on python-dev.

Such major changes do need more discussion and possibly a PEP as well.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 27 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            40 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Tue May 27 12:27:21 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 27 May 2008 20:27:21 +1000
Subject: [Python-3000] str(container) calls repr()
In-Reply-To: <20080527095709.GC15216@phd.pp.ru>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>	<483B7668.1090800@gmail.com>
	<20080527062947.GA14808@phd.pp.ru>	<483BC2B3.6040308@gmail.com>
	<20080527083612.GA15216@phd.pp.ru>	<483BD44A.3070902@gmail.com>
	<20080527095709.GC15216@phd.pp.ru>
Message-ID: <483BE209.3090202@gmail.com>

Oleg Broytmann wrote:
> On Tue, May 27, 2008 at 07:28:42PM +1000, Nick Coghlan wrote:
>> The problem arises whenever you have two different objects which can 
>> produce the same answer for str(), but different answers for repr(). 
> 
>    Aside strings itself, the only example of such objects I can imagine is
> numbers (ints, floats and decimals). str(12) is the same as str('12') and
> the same as str(Decimal('1')), but that's all.

Those are the only examples I can think of in the standard library, but 
who knows what user code is doing. We shouldn't break that without 
compelling use cases.

>> All I'm really asking for here is for people to identify the use cases 
>> that justify introducing such a potential for ambiguity into the 
>> container implementations.
> 
>    Why do you afraid of ambiguity so much? str() is supposed to produce
> pretty output, not necessary non-ambiguous, tight? And if a user wants
> non-ambiguity [s]he will use repr().

Agreed, but I see the fact that the 'pretty' representation of a 
container is also unambiguous as a feature rather than a bug (note that 
even the pprint pretty printing module uses repr() for container contents).

*shrug* Make the case in a PEP for str() of the standard containers to 
recurse using str() instead of repr() and get enough people to agree 
that it's a good idea and I'll shut up about it. Until that happens, I'd 
like people to stop claiming that the current behaviour is a bug in any 
way shape or form.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From mal at egenix.com  Tue May 27 12:35:04 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 27 May 2008 12:35:04 +0200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <483BE3D8.4080806@egenix.com>

On 2008-05-27 05:02, Stephen J. Turnbull wrote:
> Jim Jewett writes:
> 
>  > The only reason for this change is that __repr__ gets used when
>  > __str__ *should* be used instead.
> 
> That's not what the advocates say.
> 
> Now, repr() is supposed to return something that is acceptable to eval
> (but doesn't always, especially for recursive objects), while str() is
> supposed to be more "user-friendly" (but can be horrible if you need
> to see precisely what the contents are or on an output device that's
> not prepared for it).

AFAIK, eval(repr(obj)) is no longer a requirement... simply because
it has always only worked for a small subset of objects and in
reality, you wouldn't want to call eval() on anything too often
due to the security implications.

In my daily use, I see repr(obj) as a way to get a debugging text
view of an object, whereas str(obj) is a way to convert it into text.

If an object doesn't have a special debugging text view, then it's
fine to use the standard text view instead.

> As far as I can tell, which should be used is a "beauty in the eye of
> the beholder" issue, and in the case of repr() Spanish and Chinese
> users are going to feel more or less differently from Americans about
> which characters should be escaped.

I'm not sure that's always the case, but users should certainly
have the freedom to decide whether they prefer backslashed quoted
code points or glyphs on their screen.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 27 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            40 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From theller at ctypes.org  Tue May 27 12:56:27 2008
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 27 May 2008 12:56:27 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483BDE11.509@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
Message-ID: <g1gpcp$teo$1@ger.gmane.org>

M.-A. Lemburg schrieb:
> On 2008-05-26 23:34, Christian Heimes wrote:
>> M.-A. Lemburg schrieb:
>>> Isn't that an awefuly confusing approach ?
>>> 
>>> Wouldn't it be better to keep PyString APIs and definitions in 
>>> stringobject.c|h and only add a new bytesobject.h header file
>>> that #defines the PyBytes APIs in terms of PyString APIs ? That
>>> maintains backwards compatibility and allows Python internals to
>>> use the new API names.
>>> 
>>> With your approach, you've basically backported the confusing 
>>> notion in Py3k that str() maps PyUnicode, only that in Py2 str()
>>> will now map to PyBytes.
>> 
>> The last time I brought up the topic, I had a lengthy discussion
>> with Guido. At first I wanted to rename the API in Python 3.0 only.
>> Guido argued that it's going to cause too much merge conflicts. He
>> then suggested the approach I implemented today.
> 
> That's the same argument that came up in the module renaming 
> discussion.
> 
> I have a feeling that we should be looking for better merge tools,
> rather than implement code changes that cause more trouble than do
> good, just because our existing tools aren't smart enough.

There are applications out there that dynamically import the python dll
and link to the exported functions by name; they will all break.

I believe in the past we have been more carefull with changes like these.
Even when python api functions were turned into cpp macros, we provided
exported functions for them; for a few examples see the function definitions
near line 1778 in file Python/pythonrun.c .

Thomas


From stefan_ml at behnel.de  Tue May 27 13:10:42 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 27 May 2008 11:10:42 +0000 (UTC)
Subject: [Python-3000] =?utf-8?q?Why_is_type=5Fmodified=28=29_in_typeobjec?=
	=?utf-8?q?t=2Ec_not_a_public_function=3F?=
Message-ID: <loom.20080527T110321-106@post.gmane.org>

Hi,

when we build extension classes in Cython, we have to first build the type to
make it available to user code, and then update the type's tp_dict while we run
the class body code (PyObject_SetAttr() does not work here). In Py2.6+, this
requires invalidating the method cache after each attribute change, which Python
does internally using the type_modified() function.

Could this function get a public interface? I do not think Cython is the only
case where C code wants to modify a type after its creation, and copying the
code over seems like a hack to me.

Stefan


From mal at egenix.com  Tue May 27 13:54:26 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Tue, 27 May 2008 13:54:26 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <g1gpcp$teo$1@ger.gmane.org>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>
	<483BDE11.509@egenix.com> <g1gpcp$teo$1@ger.gmane.org>
Message-ID: <483BF672.6020108@egenix.com>

On 2008-05-27 12:56, Thomas Heller wrote:
> M.-A. Lemburg schrieb:
>> On 2008-05-26 23:34, Christian Heimes wrote:
>>> M.-A. Lemburg schrieb:
>>>> Isn't that an awefuly confusing approach ?
>>>>
>>>> Wouldn't it be better to keep PyString APIs and definitions in 
>>>> stringobject.c|h and only add a new bytesobject.h header file
>>>> that #defines the PyBytes APIs in terms of PyString APIs ? That
>>>> maintains backwards compatibility and allows Python internals to
>>>> use the new API names.
>>>>
>>>> With your approach, you've basically backported the confusing 
>>>> notion in Py3k that str() maps PyUnicode, only that in Py2 str()
>>>> will now map to PyBytes.
>>> The last time I brought up the topic, I had a lengthy discussion
>>> with Guido. At first I wanted to rename the API in Python 3.0 only.
>>> Guido argued that it's going to cause too much merge conflicts. He
>>> then suggested the approach I implemented today.
>> That's the same argument that came up in the module renaming 
>> discussion.
>>
>> I have a feeling that we should be looking for better merge tools,
>> rather than implement code changes that cause more trouble than do
>> good, just because our existing tools aren't smart enough.
> 
> There are applications out there that dynamically import the python dll
> and link to the exported functions by name; they will all break.

The exported APIs still use the old names. Just the source code
versions of the APIs change to the new names and they now live
in different files as well.

> I believe in the past we have been more carefull with changes like these.
> Even when python api functions were turned into cpp macros, we provided
> exported functions for them; for a few examples see the function definitions
> near line 1778 in file Python/pythonrun.c .

IMO, we should keep using that strategy for Python 2.x.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 27 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            40 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From dalcinl at gmail.com  Tue May 27 15:39:02 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Tue, 27 May 2008 10:39:02 -0300
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <48397ECC.9070805@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
Message-ID: <e7ba66e40805270639r15b624a8rb330e03831c8a155@mail.gmail.com>

Chistian, I've posted some weeks ago some observation about the status
of PyNumberMethods API. The thread link is below, I t did not received
much atention.

http://mail.python.org/pipermail/python-3000/2008-May/013594.html

Now I sumarize that post

* 'nb_nonzero' was renamed to 'nb_bool'
* 'nb_inplace_divide' was removed
* 'nb_hex', 'nb_oct', and 'nb_coerce' are there, but they are unused

IMHO, the PyNumbersMethods struct should be left as in Py2, or it
should be cleaned up, that is, all unused slots should be removed.


On 5/25/08, Christian Heimes <lists at cheimes.de> wrote:
> Hello!
>
>  The first set of betas of Python 2.6 and 3.0 is fast apace. I like to
>  grab the final chance and clean up the C API of 2.6 and 3.0. I know, I
>  know, I brought up the topic two times in the past. But this time I mean
>  it for real! :]
>
>  Last time Guido said:
>  ---
>  I think it can actually be simplified. I think maintaining binary
>  compatibility between 2.6 and earlier versions is hopeless anyway, so
>  we might as well just rename PyString to PyBytes in 2.6 and 3.0, and
>  have an extra set of macros so that code using PyString needs to be
>  recompiled but not otherwise touched. E.g.
>
>  typedef { ... } PyBytesObject;
>  #define PyStringObject PyBytesObject
>
>  ... PyString_Type;
>  #define PyBytes_Type PyString_Type
>
>  <etc>
>  ---
>
>  I like to follow Guido's advice and change the code as following:
>
>   * replace PyBytes_ with PyByteArray_
>   * replace PyString with PyBytes_
>   * rename bytesobject.[ch] to bytearrayobject.[ch]
>   * rename stringobject.[ch] to bytesobject.[ch]
>   * add a new file stringobject.h which contains the aliases PyString_ ->
>  PyBytes_
>
>  Christian
>  _______________________________________________
>  Python-3000 mailing list
>  Python-3000 at python.org
>  http://mail.python.org/mailman/listinfo/python-3000
>  Unsubscribe: http://mail.python.org/mailman/options/python-3000/dalcinl%40gmail.com
>


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From bwinton at latte.ca  Tue May 27 17:31:05 2008
From: bwinton at latte.ca (Blake Winton)
Date: Tue, 27 May 2008 11:31:05 -0400
Subject: [Python-3000] UPDATED: PEP 3138- String
 representation	in	Python 3000
In-Reply-To: <483BC2B3.6040308@gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>	<483B7668.1090800@gmail.com>	<20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com>
Message-ID: <483C2939.2000409@latte.ca>

Nick Coghlan wrote:
> Oleg Broytmann wrote:
>>    str(container) not calling str() on items is at least a strange and
>> unexpected behaviour, if not a bug.
> I have no problem at all with people calling this behaviour surprising 
> and unexpected, but I'm not happy with them calling it a bug without 
> being challenged, since there are very good reasons for it working the 
> way it does.
> 
> str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce 
> distinctive output: calling repr() on container contents achieves this, 
> calling str() does not.

Why?

Seriously, I can write:
 >>> print 1, "1", Decimal("1")
and get as my output:
1 1 1
and somehow no-one complains that it should actually print:
1 "1" Decimal("1")
but for some reason when I put those in a list, they should magically 
change their display?

I think the burden of proof is on you to explain why, when we have a 
perfectly good name for unambiguous output ("repr"), do we need to 
override the name for ambiguous and nicely-formatted output ("str"), to 
achieve the same goal.

> While it could be argued that if you want unambiguous output you should 
> be invoking repr() on the container instead of str(), I'm still seeing 
> many more downsides than upsides to the idea of making str() on the 
> builtin containers display their contents with str() instead of repr().

But which downsides do you see that aren't solved by the use of repr to 
get unambiguous output?

Later,
Blake.

From facundobatista at gmail.com  Tue May 27 18:01:07 2008
From: facundobatista at gmail.com (Facundo Batista)
Date: Tue, 27 May 2008 13:01:07 -0300
Subject: [Python-3000] Py3 docs
Message-ID: <e04bdf310805270901j658f9eb2v151da0fadae05ca9@mail.gmail.com>

Hi all!

Is there any official web site with the documentation for Python 3.0
in html format?

If not, let me share with you this one [1], updated nightly, kindly
set up by another Argentinian pythonista (Humitos).

Regards,

[1] http://humitos.homelinux.net/py3kdoc/

-- 
. Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From amauryfa at gmail.com  Tue May 27 18:06:05 2008
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Tue, 27 May 2008 18:06:05 +0200
Subject: [Python-3000] Py3 docs
In-Reply-To: <e04bdf310805270901j658f9eb2v151da0fadae05ca9@mail.gmail.com>
References: <e04bdf310805270901j658f9eb2v151da0fadae05ca9@mail.gmail.com>
Message-ID: <e27efe130805270906g182cd18fg28be9cdf29e2c1b7@mail.gmail.com>

Facundo Batista wrote:
> Is there any official web site with the documentation for Python 3.0
> in html format?
>
> If not, let me share with you this one [1], updated nightly, kindly
> set up by another Argentinian pythonista (Humitos).
>
> Regards,
>
> [1] http://humitos.homelinux.net/py3kdoc/

Well, I use this one every day:
http://docs.python.org/dev/3.0/

-- 
Amaury Forgeot d'Arc

From stefan_ml at behnel.de  Tue May 27 18:08:42 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Tue, 27 May 2008 16:08:42 +0000 (UTC)
Subject: [Python-3000] Py3 docs
References: <e04bdf310805270901j658f9eb2v151da0fadae05ca9@mail.gmail.com>
Message-ID: <loom.20080527T160758-748@post.gmane.org>

Facundo Batista <facundobatista <at> gmail.com> writes:
> Is there any official web site with the documentation for Python 3.0
> in html format?

You mean like this?

http://docs.python.org/dev/3.0/

Stefan


From facundobatista at gmail.com  Tue May 27 18:09:57 2008
From: facundobatista at gmail.com (Facundo Batista)
Date: Tue, 27 May 2008 13:09:57 -0300
Subject: [Python-3000] Py3 docs
In-Reply-To: <e27efe130805270906g182cd18fg28be9cdf29e2c1b7@mail.gmail.com>
References: <e04bdf310805270901j658f9eb2v151da0fadae05ca9@mail.gmail.com>
	<e27efe130805270906g182cd18fg28be9cdf29e2c1b7@mail.gmail.com>
Message-ID: <e04bdf310805270909s6111c1d1je6f52d7068686f44@mail.gmail.com>

2008/5/27 Amaury Forgeot d'Arc <amauryfa at gmail.com>:

> Well, I use this one every day:
> http://docs.python.org/dev/3.0/

Ah, didn't know about this (I was reading the .rst directly before).

Thanks!

-- 
. Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From guido at python.org  Tue May 27 18:49:47 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 09:49:47 -0700
Subject: [Python-3000] dbm package creation
In-Reply-To: <bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
References: <g1ce83$rb6$1@ger.gmane.org>
	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
Message-ID: <ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>

On Sun, May 25, 2008 at 3:08 PM, Brett Cannon <brett at python.org> wrote:
> On Sun, May 25, 2008 at 12:21 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>> Hi,
>>
>> I'll handle the PEP 3108 dbm package if nobody else is already at it.
>>
>
> I know I have not started the work.
>
>> Two questions though:
>>
>> * the whichdb() function returns strings that are module names.  These
>>  names won't be importable anymore in 3k.  Should the return values
>>  remain the same in 3k, or should whichdb() return the new names, and
>>  if the latter, including "dbm." or not?
>>
>
> New names with the package name prepended.
>
> Should probably change the API at some point to just return the module
> to use instead of the name.

I'm not sure I disagree. I see the return value as an enum, only one
use for which is to import it. (If you wanted to just use the module,
why not use anydbm?) I'd prefer to keep the return strings the same
(no 'dbm.' prefix) and fix the code that uses whichdb.

Or is there an expected future use case where the returned value would
be something in a *different* package?

Returning a module object would seem the least attractive version --
that would require importing the module, which may not be in the
caller's plan at all.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue May 27 19:04:45 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 10:04:45 -0700
Subject: [Python-3000] str(container) calls repr() (was: PEP 3138)
In-Reply-To: <20080527083612.GA15216@phd.pp.ru>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com> <20080527083612.GA15216@phd.pp.ru>
Message-ID: <ca471dc20805271004u243422fdkc7aa71dc9980b9ff@mail.gmail.com>

On Tue, May 27, 2008 at 1:36 AM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> On Tue, May 27, 2008 at 06:13:39PM +1000, Nick Coghlan wrote:
>> str([1, 2, 3]), str(list("123")) and str(["1, 2, 3"]) should all produce
>> distinctive output: calling repr() on container contents achieves this,
>> calling str() does not.
>
>   String representation is a special case and *the only* special case, and
> must be handled as a special case. I don't like this special case to be used
> as a model for all other types. No other type allows usage like list("123").
>
>> While it could be argued that if you want unambiguous output you should
>> be invoking repr() on the container instead of str(), I'm still seeing
>> many more downsides than upsides to the idea of making str() on the
>> builtin containers display their contents with str() instead of repr().
>
>   The decision should be upon the user. In an ideal world str(container)
> calls str() on items, and repr(container) calls repr() on items, so the
> user can choose what [s]he wants. Currently user is just stuck with repr().

I disagree. Calling str() on items is counterproductive.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Tue May 27 20:53:25 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 14:53:25 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <483B72B8.30906@gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
	<20080526232402.GC8849@phd.pp.ru>
	<aac2c7cb0805261700k78bceb80g9ddec2465721abf7@mail.gmail.com>
	<fb6fbf560805261913neb157dya75f85494953ae95@mail.gmail.com>
	<483B72B8.30906@gmail.com>
Message-ID: <fb6fbf560805271153m66b32261n4593029c75d4f946@mail.gmail.com>

On 5/26/08, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Jim Jewett wrote:

> > The problem is classes where str(x) != repr(x), and how they get
> > messed up when a container holding (one of their) instances is
> > printed.

> This is NOT a bug, since str([1, 2, 3]) and str(list("123")) SHOULD produce
> results that look different.

Ideally, but it isn't that important.  repr(1) and repr("1") need to
be different, but if str(1) and str("1") look alike, that is
acceptable.  It already happens with

     >>> "%s %s " % (1, "1")

> Calling str() internally to display the
> contents of containers is a broken idea.

If you are using it for precise debugging, you should use repr -- and
repr should be used all the way down.  If you're using str because you
want (fairly) readable output, then str should be used all the way
down.

> The ambiguity that recursive calls
> to str() would introduce would make any concerns about potential confusion
> between different Unicode glyphs seem utterly inconsequential.

So don't do it on repr -- do it only on str -- but always do it on
str, even when str falls back to repr for a particular level of
recursion.

-jJ

From jimjjewett at gmail.com  Tue May 27 21:08:18 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 15:08:18 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
Message-ID: <fb6fbf560805271208xa425820tb1fed0e1ae1859d1@mail.gmail.com>

On 5/26/08, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Jim Jewett writes:

>   > The only reason for this change is that __repr__ gets used when
>   > __str__ *should* be used instead.

> That's not what the advocates say.

I still haven't seen a use case where it *should* be using repr *and*
needs to print outside of ASCII.

There are plenty of cases where it *is* using repr because
str(container) fell back to repr, and then the contained strings stay
in repr instead of shifting back to str.  I just haven't seen any
where repr is the *right* function, as opposed to what they're stuck
with because a container doesn't implement a separate __str__.  [The
file exceptions *may* be a separate case, because of tracebacks using
repr to print, but I'm not sure even there.]

> Well, in `String Conversions', the manual says """In particular,
>  converting a string adds quotes around it and converts
>  "funny" characters to escape sequences that are safe to print."""

>  Now, I agree with you about what's "safe".  However, in a text-
>  processing application in a Japanese environment, that's hardly
>  useful, and our Japanese programmer can argue that in his environment,
>  printing all of Unicode *is* safe.

I think he or she will still be wrong, because of confusables -- it is
just that "unsafe" characters are far more rare (since byte value
alone isn't a problem) and the cost of not printing non-ASCII
characters is higher.

So I suggest that he or she use str, rather than repr -- and that we
fix containers to make this possible.

>   > Again -- *why* is repr used instead of str?

> I don't use it myself other than as a way of diagnosing bugs in
>  programs I write or maintain; in personal practice, I'm in your camp.
>  But my understanding is that there is often an intermediate level,
>  such as a website admin, who needs *some* of the precision of repr()
>  such as escaped representation of whitespace, but also needs to be
>  able read most of the output.

Could someone who does need this explain more?

I understand wanting the two side-by-side.  I sometimes want that with hex.

I understand wanting a container's contents to be readable.  I realize
you can't easily get that today, and consider that a bug.  (Nick's
disagreement noted.)

I don't understand needing *exactly* whitespace escaped, but not, say,
stray characters from scripts you've never used, even though the rest
of the page *is* in an expected script.

-jJ

From jimjjewett at gmail.com  Tue May 27 21:17:29 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 15:17:29 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <aac2c7cb0805262152p4b0c5919ra2022c9ddded304f@mail.gmail.com>
References: <ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<20080524171814.GA4026@phd.pp.ru>
	<fb6fbf560805261553k71ea4989ma9bb197550ea92cd@mail.gmail.com>
	<20080526232402.GC8849@phd.pp.ru>
	<aac2c7cb0805261700k78bceb80g9ddec2465721abf7@mail.gmail.com>
	<fb6fbf560805261913neb157dya75f85494953ae95@mail.gmail.com>
	<aac2c7cb0805262152p4b0c5919ra2022c9ddded304f@mail.gmail.com>
Message-ID: <fb6fbf560805271217v293e9addj5a0df1de2b271386@mail.gmail.com>

On 5/27/08, Adam Olsen <rhamph at gmail.com> wrote:
> On Mon, May 26, 2008 at 8:13 PM, Jim Jewett <jimjjewett at gmail.com> wrote:

>  > The problem is classes where str(x) != repr(x), and how they get
>  > messed up when a container holding (one of their) instances is
>  > printed.

>  >>>> class A:
>  >>>>        def __str__(self):  return "an A"
>  >>>> a=A()

>  >>>> print a    # this is fine.
>  > an A
>  >>>> str(a)      # this is OK, you have asked for "%s" % a
>  > 'an A'
>  >>>> repr(a)   # this is OK, you wanted repr explicitly.
>  > '<__main__.A instance at 0x012DDAF8>'

>  >>>> print ([a])   # this stinks ...
>  > [<__main__.A instance at 0x012DDAF8>]

>  > It would be much better as:

>  >>>> print ([a])   # after fixing the recursion bug
>  > ['an a']

>  Hmm, I see where the confusion is.  Containers only define __repr__,
>  so although you think it's the list.__str__ that's mistakenly using
>  repr(), it's str(list) itself that's calling repr(list).

Exactly.
And the fact that it then calls repr (rather than str) on the contents
-- even though the user asked for str -- is what I (but not Nick)
consider a bug.  I believe this bug is also the only real source of
the pain that motivates PEP 3138.

>  So the question to ask is whether we can define a useful __str__ for
>  containers.  str(['an a']) -> '[an a]' is not too bad, but
>  str(['hello, world']) -> '[hello, world]' is ambiguous.  It crosses
>  the line into garbage.

I would consider '["hello, world"]' to be perfectly acceptable; the
conversion to str was explicit.

But to be honest, I would also accept  '[hello, world]' despite the
ambiguity -- if ambiguity is a problem, then str probably isn't the
right function.  (Admittedly, that does increase the pressure for a
3rd case in between; I'm just not sure that there would be enough need
for that in-between, if str worked "all the way down".)

>  Along the way I've found a new definition of repr: unambiguous when
>  used in a container's repr.

err ... actually even that isn't met, if you look too hard at corner
(or malicious) cases.  But I agree that it is a good goal -- which
need not apply to a containers __str__.

-jJ

From jimjjewett at gmail.com  Tue May 27 21:33:02 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 15:33:02 -0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <483BE3D8.4080806@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<483BE3D8.4080806@egenix.com>
Message-ID: <fb6fbf560805271233i5412e271sea69f9e80b935800@mail.gmail.com>

On 5/27/08, M.-A. Lemburg <mal at egenix.com> wrote:
> On 2008-05-27 05:02, Stephen J. Turnbull wrote:
> > ... repr() Spanish and Chinese users are going to feel more or less
> > differently from Americans about which characters should be escaped.

>  I'm not sure that's always the case, but users should certainly
>  have the freedom to decide whether they prefer backslashed quoted
>  code points or glyphs on their screen.

Agreed, and they already do if they go far enough out of their way to
be explicit.

The question is what to do by default.

We agree that, by default, str(x) should display glyphs when possible.
 (And changing this is hard in practice, even if you don't recognize
the glyphs.)

We agree that, by default today, repr uses backslash.  (And changing
this is hard, even if you do recognize the glyphs.)

We agree also agree that in many cases, people want the glyphs but get
a backslash.

The only disagreement is over how to fix this.

PEP 3138 says that repr should start printing unicode glyphs.

I say that repr should (insetad) start recognizing when it was called
in place of __str__, and should revert back to __str__ when it
recurses down to the next level.

-jJ

From guido at python.org  Tue May 27 21:42:48 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 12:42:48 -0700
Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka String
	ABC)
In-Reply-To: <loom.20080527T192243-415@post.gmane.org>
References: <loom.20080527T192243-415@post.gmane.org>
Message-ID: <ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com>

[+python-3000]

On Tue, May 27, 2008 at 12:32 PM, Armin Ronacher
<armin.ronacher at active-4.com> wrote:
> Strings are currently iterable and it was stated multiple times that this is a
> good idea and shouldn't change.  While I still don't think that that's a good
> idea I would like to propose a solution for the problem many people are
> experiencing by introducing an abstract base class for strings.
>
> Basically *the* problematic situation with iterable strings is something like
> a `flatten` function that flattens out every iterable object except of strings.
> Imagine it's implemented in a way similar to that::
>
>    def flatten(iterable):
>        for item in iterable:
>            try:
>                if isinstance(item, basestring):
>                    raise TypeError()
>                iterator = iter(item)
>            except TypeError:
>                yield item
>            else:
>                for i in flatten(iterator):
>                    yield i
>
> A problem comes up as soon as user defined strings (such as UserString) is
> passed to the function.  In my opinion a good solution would be a "String"
> ABC one could test against.

I'm not against this, but so far I've not been able to come up with a
good set of methods to endow the String ABC with. Another problem is
that not everybody draws the line in the same place -- how should
instances of bytes, bytearray, array.array, memoryview (buffer in 2.6)
be treated?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Tue May 27 22:03:51 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 16:03:51 -0400
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <483B7668.1090800@gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com>
Message-ID: <fb6fbf560805271303vd22934dtfa1a4f4161b5e654@mail.gmail.com>

On 5/26/08, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > The problem isn't that I want to be able to write code that acts the
> > old way; the problem is that I want to ensure all code running on my
> > system acts the old way.

>  This is for Py3k - you'll be lucky if your old code runs at all, let alone
> in the same way.

Again, this isn't about code I wrote; it is about code someone else
wrote.  If they used a new function designed to display unicode, then
I know it was intentional.  If they used repr, then it is quite likely
that they were using 2.x repr, and just didn't consider the non-ASCII
case.

> > Keeping repr and fixing the way it recurses when used as a str
> > fallback would be even better.

> No it wouldn't - the ambiguity introduced by doing so would
> dwarf anything

It wouldn't add *any* ambiguity when someone called repr explicitly.
When they called str explicitly, ambiguity would occur exactly for
objects where it is already tolerable for str.  (Because these same
objects would already be ambigous if they were top-level objects
instead of contained subobjects.)

>  In terms of actual IO for display to a user, why do you care if something
> gets backslash replaced or not? The characters which are replaced will only
> be those which the user's terminal can't display anyway.

[Assuming non-buggy terminals, yes.]

My biggest concern is with characters that the *terminal* can display
fine, but which the *human* will not recognize.  (At least not without
some emphasis warning them that something is unexpected.)

For str, those characters are not a problem -- if I don't notice them,
then they (almost by definition) are not crucial to me.

For repr, that is a problem.  If I am using repr, then I want
attention called to anything unexpected.  The fact that a character
might be common somewhere else doesn't matter -- *I* wasn't expecting
in *in my environment*.   A system-level switch to add expected
characters is fine.  A generic assumption that anything printable is
expected -- that is not fine.

-jJ

From jyasskin at gmail.com  Tue May 27 22:04:02 2008
From: jyasskin at gmail.com (Jeffrey Yasskin)
Date: Tue, 27 May 2008 15:04:02 -0500
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805271233i5412e271sea69f9e80b935800@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<483BE3D8.4080806@egenix.com>
	<fb6fbf560805271233i5412e271sea69f9e80b935800@mail.gmail.com>
Message-ID: <5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com>

On Tue, May 27, 2008 at 2:33 PM, Jim Jewett <jimjjewett at gmail.com> wrote:
> I say that repr should (insetad) start recognizing when it was called
> in place of __str__, and should revert back to __str__ when it
> recurses down to the next level.

That sounds like a PEP you should write, which, if accepted, might
obviate some of the rationale for this one.

-- 
Namast?,
Jeffrey Yasskin

From phd at phd.pp.ru  Tue May 27 22:14:50 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Wed, 28 May 2008 00:14:50 +0400
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com>
References: <87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<483BE3D8.4080806@egenix.com>
	<fb6fbf560805271233i5412e271sea69f9e80b935800@mail.gmail.com>
	<5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com>
Message-ID: <20080527201450.GA29645@phd.pp.ru>

On Tue, May 27, 2008 at 03:04:02PM -0500, Jeffrey Yasskin wrote:
> On Tue, May 27, 2008 at 2:33 PM, Jim Jewett <jimjjewett at gmail.com> wrote:
> > I say that repr should (insetad) start recognizing when it was called
> > in place of __str__, and should revert back to __str__ when it
> > recurses down to the next level.
> 
> That sounds like a PEP you should write, which, if accepted, might
> obviate some of the rationale for this one.

   I have wrote the PEP. The draft is being discussed now in the Russian
Python and Zope mailing list. I will post it here tomorrow.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From benji at benjiyork.com  Tue May 27 22:09:47 2008
From: benji at benjiyork.com (Benji York)
Date: Tue, 27 May 2008 16:09:47 -0400
Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka String
	ABC)
In-Reply-To: <ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com>
References: <loom.20080527T192243-415@post.gmane.org>
	<ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com>
Message-ID: <e5fff6640805271309ybe102d3y3526f0e32141df2@mail.gmail.com>

On Tue, May 27, 2008 at 3:42 PM, Guido van Rossum <guido at python.org> wrote:
> [+python-3000]
>
> On Tue, May 27, 2008 at 12:32 PM, Armin Ronacher
> <armin.ronacher at active-4.com> wrote:
>> Basically *the* problematic situation with iterable strings is something like
>> a `flatten` function that flattens out every iterable object except of strings.
>> Imagine it's implemented in a way similar to that::
>
> I'm not against this, but so far I've not been able to come up with a
> good set of methods to endow the String ABC with. Another problem is
> that not everybody draws the line in the same place -- how should
> instances of bytes, bytearray, array.array, memoryview (buffer in 2.6)
> be treated?

Maybe the opposite approach would be more fruitful.  Flattening is about
removing nested "containers", so perhaps there should be an ABC that
things like lists and tuples provide, but strings don't.  No idea what
that might be.
-- 
Benji York

From rhamph at gmail.com  Tue May 27 22:21:57 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Tue, 27 May 2008 14:21:57 -0600
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <fb6fbf560805271303vd22934dtfa1a4f4161b5e654@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com>
	<fb6fbf560805271303vd22934dtfa1a4f4161b5e654@mail.gmail.com>
Message-ID: <aac2c7cb0805271321g324eec8bpca9e6d809f9e8f9d@mail.gmail.com>

On Tue, May 27, 2008 at 2:03 PM, Jim Jewett <jimjjewett at gmail.com> wrote:
> On 5/26/08, Nick Coghlan <ncoghlan at gmail.com> wrote:
>> > The problem isn't that I want to be able to write code that acts the
>> > old way; the problem is that I want to ensure all code running on my
>> > system acts the old way.
>
>>  This is for Py3k - you'll be lucky if your old code runs at all, let alone
>> in the same way.
>
> Again, this isn't about code I wrote; it is about code someone else
> wrote.  If they used a new function designed to display unicode, then
> I know it was intentional.  If they used repr, then it is quite likely
> that they were using 2.x repr, and just didn't consider the non-ASCII
> case.

Welcome to 3.0: unicode is now the norm.


>> > Keeping repr and fixing the way it recurses when used as a str
>> > fallback would be even better.
>
>> No it wouldn't - the ambiguity introduced by doing so would
>> dwarf anything
>
> It wouldn't add *any* ambiguity when someone called repr explicitly.
> When they called str explicitly, ambiguity would occur exactly for
> objects where it is already tolerable for str.  (Because these same
> objects would already be ambigous if they were top-level objects
> instead of contained subobjects.)

I don't think str() is normally used on containers.  str(3) and
str('hello') are shallow and explicit - not ambiguous.  The fact that
we fallback to repr() when there is no sensible __str__ means we can
use print(obj) for debugging and have it Just Work.

If you really cared we could remove the fallback behaviour, raising a
TypeError instead, but this won't do anything to help PEP 3138.  We'd
need a third function that applies to containers (like repr),
differing only in how it handles non-ascii.  PEP 3138 already provides
a simple solution for this though: ascii_repr().  It's just not the
default repr().


-- 
Adam Olsen, aka Rhamphoryncus

From gregor.lingl at aon.at  Tue May 27 22:48:25 2008
From: gregor.lingl at aon.at (Gregor Lingl)
Date: Tue, 27 May 2008 22:48:25 +0200
Subject: [Python-3000] how to deal with compatibility problems (example:
	turtle module)
Message-ID: <483C7399.3050103@aon.at>

Hi,

when doing some final checking of the new turtle module I ran into the
following problem, which I'd like to discuss with the intention to clarify
how to handle problems that result more or less from suboptimal design
decisions of the module to replace.:

(1) As requested I added an __all__ variable to the new turtle module
to define those names, that will be imported by: from turtle import *
Of course I consider this very useful. (The old module didn't have an
__all__ variable)

(2) Moreover it was requested that  the new  turtle  module be  fully
compatible with the old one.

(3) The old module has a from math import *  statement, which
results in importing all names from math when doing from turtle import *.
Moreover there are defined two functions in turtle.py which overwrite
the correspoding functions from math (namely degrees() and radians())
Is this a feature which should be retained? (I suppose that it was not
intended by the developer of the old turtle module but happened
somehow.)

If so, I had to add all names from dir(math) to my __all__ variable
(except those two mentioned above).

My personal opinion is, that this would be a rather ugly solution, and
I think that this 'feature' should at least be eliminated in the Python 3.0
version.

On the other hand one could argue, that most (if not all) of the functions
in math are normally not used by users of turtle, and those who use it
certainly know how to import what they need. So one could drop the
from math import * already in Python 2.6. But, of course, this argument
doesn't consider the possibility of breaking some old code.

I'm also interested in how to proceed with this, because there are a few
similar problems with the turtle module which should be solved with the
transition from Python2.6 to Python3.0

So, generally, which guidelines should one use to decide  on problems like
this - and who is the one who decides?

With best regards
Gregor


From g.brandl at gmx.net  Tue May 27 22:48:27 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 27 May 2008 22:48:27 +0200
Subject: [Python-3000] dbm package creation
In-Reply-To: <ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>
References: <g1ce83$rb6$1@ger.gmane.org>	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
	<ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>
Message-ID: <g1hs2t$6a5$2@ger.gmane.org>

Guido van Rossum schrieb:
> On Sun, May 25, 2008 at 3:08 PM, Brett Cannon <brett at python.org> wrote:
>> On Sun, May 25, 2008 at 12:21 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>>> Hi,
>>>
>>> I'll handle the PEP 3108 dbm package if nobody else is already at it.
>>>
>>
>> I know I have not started the work.
>>
>>> Two questions though:
>>>
>>> * the whichdb() function returns strings that are module names.  These
>>>  names won't be importable anymore in 3k.  Should the return values
>>>  remain the same in 3k, or should whichdb() return the new names, and
>>>  if the latter, including "dbm." or not?
>>>
>>
>> New names with the package name prepended.
>>
>> Should probably change the API at some point to just return the module
>> to use instead of the name.
> 
> I'm not sure I disagree. I see the return value as an enum, only one
> use for which is to import it. (If you wanted to just use the module,
> why not use anydbm?) I'd prefer to keep the return strings the same
> (no 'dbm.' prefix) and fix the code that uses whichdb.

So add a mapping to dbm.__init__ that maps old names to new names?

> Or is there an expected future use case where the returned value would
> be something in a *different* package?

There was in the past, with the now-defunct bsddb185 module which was
not used by anydbm.

> Returning a module object would seem the least attractive version --
> that would require importing the module, which may not be in the
> caller's plan at all.

It may not be, but the modules are imported anyway during import of
dbm.__init__ (which contains whichdb() now.)

Georg


From guido at python.org  Tue May 27 22:57:28 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 13:57:28 -0700
Subject: [Python-3000] how to deal with compatibility problems (example:
	turtle module)
In-Reply-To: <483C7399.3050103@aon.at>
References: <483C7399.3050103@aon.at>
Message-ID: <ca471dc20805271357m570b875aja710b88b3da5e72a@mail.gmail.com>

The old turtle.py explicitly says

from math import * # Also for export

so I think it's desirable to keep this behavior. My intent with that
line was that an absolute beginner could put "from turtle import *" in
their interactive session and be able to use both the turtle code and
the high-school math functions that might come in handy, like sin()
and cos(). The other math functions don' really hurt I believe. Where
there's a naming conflict, obviously the turtle module wins.

--Guido

On Tue, May 27, 2008 at 1:48 PM, Gregor Lingl <gregor.lingl at aon.at> wrote:
> Hi,
>
> when doing some final checking of the new turtle module I ran into the
> following problem, which I'd like to discuss with the intention to clarify
> how to handle problems that result more or less from suboptimal design
> decisions of the module to replace.:
>
> (1) As requested I added an __all__ variable to the new turtle module
> to define those names, that will be imported by: from turtle import *
> Of course I consider this very useful. (The old module didn't have an
> __all__ variable)
>
> (2) Moreover it was requested that  the new  turtle  module be  fully
> compatible with the old one.
>
> (3) The old module has a from math import *  statement, which
> results in importing all names from math when doing from turtle import *.
> Moreover there are defined two functions in turtle.py which overwrite
> the correspoding functions from math (namely degrees() and radians())
> Is this a feature which should be retained? (I suppose that it was not
> intended by the developer of the old turtle module but happened
> somehow.)
>
> If so, I had to add all names from dir(math) to my __all__ variable
> (except those two mentioned above).
>
> My personal opinion is, that this would be a rather ugly solution, and
> I think that this 'feature' should at least be eliminated in the Python 3.0
> version.
>
> On the other hand one could argue, that most (if not all) of the functions
> in math are normally not used by users of turtle, and those who use it
> certainly know how to import what they need. So one could drop the
> from math import * already in Python 2.6. But, of course, this argument
> doesn't consider the possibility of breaking some old code.
>
> I'm also interested in how to proceed with this, because there are a few
> similar problems with the turtle module which should be solved with the
> transition from Python2.6 to Python3.0
>
> So, generally, which guidelines should one use to decide  on problems like
> this - and who is the one who decides?
>
> With best regards
> Gregor
>
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Tue May 27 22:59:10 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 13:59:10 -0700
Subject: [Python-3000] dbm package creation
In-Reply-To: <g1hs2t$6a5$2@ger.gmane.org>
References: <g1ce83$rb6$1@ger.gmane.org>
	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
	<ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>
	<g1hs2t$6a5$2@ger.gmane.org>
Message-ID: <ca471dc20805271359t7a23995cn399342f371d07cdc@mail.gmail.com>

On Tue, May 27, 2008 at 1:48 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum schrieb:
>>
>> On Sun, May 25, 2008 at 3:08 PM, Brett Cannon <brett at python.org> wrote:
>>>
>>> On Sun, May 25, 2008 at 12:21 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I'll handle the PEP 3108 dbm package if nobody else is already at it.
>>>>
>>>
>>> I know I have not started the work.
>>>
>>>> Two questions though:
>>>>
>>>> * the whichdb() function returns strings that are module names.  These
>>>>  names won't be importable anymore in 3k.  Should the return values
>>>>  remain the same in 3k, or should whichdb() return the new names, and
>>>>  if the latter, including "dbm." or not?
>>>>
>>>
>>> New names with the package name prepended.
>>>
>>> Should probably change the API at some point to just return the module
>>> to use instead of the name.
>>
>> I'm not sure I disagree. I see the return value as an enum, only one
>> use for which is to import it. (If you wanted to just use the module,
>> why not use anydbm?) I'd prefer to keep the return strings the same
>> (no 'dbm.' prefix) and fix the code that uses whichdb.
>
> So add a mapping to dbm.__init__ that maps old names to new names?

Is the mapping not just 'dbm.' + x?

>> Or is there an expected future use case where the returned value would
>> be something in a *different* package?
>
> There was in the past, with the now-defunct bsddb185 module which was
> not used by anydbm.
>
>> Returning a module object would seem the least attractive version --
>> that would require importing the module, which may not be in the
>> caller's plan at all.
>
> It may not be, but the modules are imported anyway during import of
> dbm.__init__ (which contains whichdb() now.)

Hm, that's a regression if you ask me. Couldn't you use lazy import?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Tue May 27 23:16:53 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 27 May 2008 23:16:53 +0200
Subject: [Python-3000] dbm package creation
In-Reply-To: <ca471dc20805271359t7a23995cn399342f371d07cdc@mail.gmail.com>
References: <g1ce83$rb6$1@ger.gmane.org>	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>	<ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>	<g1hs2t$6a5$2@ger.gmane.org>
	<ca471dc20805271359t7a23995cn399342f371d07cdc@mail.gmail.com>
Message-ID: <g1hto7$dbp$1@ger.gmane.org>

Guido van Rossum schrieb:

>>> I'm not sure I disagree. I see the return value as an enum, only one
>>> use for which is to import it. (If you wanted to just use the module,
>>> why not use anydbm?) I'd prefer to keep the return strings the same
>>> (no 'dbm.' prefix) and fix the code that uses whichdb.
>>
>> So add a mapping to dbm.__init__ that maps old names to new names?
> 
> Is the mapping not just 'dbm.' + x?

No. The mapping is

dbhash  -> dbm.bsd
dbm     -> dbm.ndbm (*)
gdbm    -> dbm.gnu  (*)
dumbdbm -> dbm.dumb

(*) Not exactly; the original C modules are now called _dbm and _gdbm,
     and the submodules are stubs that import those.

>>> Or is there an expected future use case where the returned value would
>>> be something in a *different* package?
>>
>> There was in the past, with the now-defunct bsddb185 module which was
>> not used by anydbm.
>>
>>> Returning a module object would seem the least attractive version --
>>> that would require importing the module, which may not be in the
>>> caller's plan at all.
>>
>> It may not be, but the modules are imported anyway during import of
>> dbm.__init__ (which contains whichdb() now.)
> 
> Hm, that's a regression if you ask me. Couldn't you use lazy import?

There's a module attribute "error" -- supposed to be a tuple of all
possible errors from the db modules; that is hard to make lazy.

Of course we could solve this by making all the different db module
errors subclasses of a common exception (but since most of them are
defined in C modules, this is hard again.)

Georg


From guido at python.org  Tue May 27 23:30:45 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 14:30:45 -0700
Subject: [Python-3000] dbm package creation
In-Reply-To: <g1hto7$dbp$1@ger.gmane.org>
References: <g1ce83$rb6$1@ger.gmane.org>
	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
	<ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>
	<g1hs2t$6a5$2@ger.gmane.org>
	<ca471dc20805271359t7a23995cn399342f371d07cdc@mail.gmail.com>
	<g1hto7$dbp$1@ger.gmane.org>
Message-ID: <ca471dc20805271430g147ab6f9u232b7af8ecfd8973@mail.gmail.com>

On Tue, May 27, 2008 at 2:16 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum schrieb:
>
>>>> I'm not sure I disagree. I see the return value as an enum, only one
>>>> use for which is to import it. (If you wanted to just use the module,
>>>> why not use anydbm?) I'd prefer to keep the return strings the same
>>>> (no 'dbm.' prefix) and fix the code that uses whichdb.
>>>
>>> So add a mapping to dbm.__init__ that maps old names to new names?
>>
>> Is the mapping not just 'dbm.' + x?
>
> No. The mapping is
>
> dbhash  -> dbm.bsd
> dbm     -> dbm.ndbm (*)
> gdbm    -> dbm.gnu  (*)
> dumbdbm -> dbm.dumb
>
> (*) Not exactly; the original C modules are now called _dbm and _gdbm,
>    and the submodules are stubs that import those.

OK. I see. Hadn't remembered how messy it was. :-(

I withdraw my opposition to returning module names. I still think
returning a module would be a bad idea.

>>>> Or is there an expected future use case where the returned value would
>>>> be something in a *different* package?
>>>
>>> There was in the past, with the now-defunct bsddb185 module which was
>>> not used by anydbm.
>>>
>>>> Returning a module object would seem the least attractive version --
>>>> that would require importing the module, which may not be in the
>>>> caller's plan at all.
>>>
>>> It may not be, but the modules are imported anyway during import of
>>> dbm.__init__ (which contains whichdb() now.)
>>
>> Hm, that's a regression if you ask me. Couldn't you use lazy import?
>
> There's a module attribute "error" -- supposed to be a tuple of all
> possible errors from the db modules; that is hard to make lazy.
>
> Of course we could solve this by making all the different db module
> errors subclasses of a common exception (but since most of them are
> defined in C modules, this is hard again.)

OK, let's make the latter a stretch goal. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Tue May 27 23:37:09 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Tue, 27 May 2008 23:37:09 +0200
Subject: [Python-3000] dbm package creation
In-Reply-To: <ca471dc20805271430g147ab6f9u232b7af8ecfd8973@mail.gmail.com>
References: <g1ce83$rb6$1@ger.gmane.org>	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>	<ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>	<g1hs2t$6a5$2@ger.gmane.org>	<ca471dc20805271359t7a23995cn399342f371d07cdc@mail.gmail.com>	<g1hto7$dbp$1@ger.gmane.org>
	<ca471dc20805271430g147ab6f9u232b7af8ecfd8973@mail.gmail.com>
Message-ID: <g1huu7$hcf$1@ger.gmane.org>

Guido van Rossum schrieb:

>>>>> Or is there an expected future use case where the returned value would
>>>>> be something in a *different* package?
>>>>
>>>> There was in the past, with the now-defunct bsddb185 module which was
>>>> not used by anydbm.
>>>>
>>>>> Returning a module object would seem the least attractive version --
>>>>> that would require importing the module, which may not be in the
>>>>> caller's plan at all.
>>>>
>>>> It may not be, but the modules are imported anyway during import of
>>>> dbm.__init__ (which contains whichdb() now.)
>>>
>>> Hm, that's a regression if you ask me. Couldn't you use lazy import?
>>
>> There's a module attribute "error" -- supposed to be a tuple of all
>> possible errors from the db modules; that is hard to make lazy.
>>
>> Of course we could solve this by making all the different db module
>> errors subclasses of a common exception (but since most of them are
>> defined in C modules, this is hard again.)
> 
> OK, let's make the latter a stretch goal. :-)

I just realized: since dumbdbm's "error" is just IOError, using
"except [any]dbm.error" will always catch IOError.

So the easy solution is to just derive the database error classes
from IOError.

The slightly harder solution is to declare the above a bug, create
a new builtin (at least on C level) error class DBError and derive
them from that.

Georg


From gregor.lingl at aon.at  Tue May 27 23:40:57 2008
From: gregor.lingl at aon.at (Gregor Lingl)
Date: Tue, 27 May 2008 23:40:57 +0200
Subject: [Python-3000] how to deal with compatibility problems (example:
 turtle module)
In-Reply-To: <ca471dc20805271357m570b875aja710b88b3da5e72a@mail.gmail.com>
References: <483C7399.3050103@aon.at>
	<ca471dc20805271357m570b875aja710b88b3da5e72a@mail.gmail.com>
Message-ID: <483C7FE9.6070909@aon.at>


Guido van Rossum schrieb:
> The old turtle.py explicitly says
>
> from math import * # Also for export
>
> so I think it's desirable to keep this behavior. My intent with that
> line was that an absolute beginner could put "from turtle import *" in
> their interactive session and be able to use both the turtle code and
> the high-school math functions that might come in handy, like sin()
> and cos(). The other math functions don' really hurt I believe. Where
> there's a naming conflict, obviously the turtle module wins.
>
> --Guido
>   
Thanks for the quick reply, I'll do it this way.

Gregor

P.S.: I'd just like to add one (critical) remark which results from some
decades working as a highschool teacher and (nearly) one decade working
with Python, and especially turtle graphics with highschool students:

sin() and cos() imported from math work with radians. The default
angle-mode for turtle is degrees. So when using trig-functions I have
to talk about radian measure and conversion of angle units. To calculate
the sine of 30 degrees for instance I had to call sin(radians(30)) etc.,
but unfortunately just this radians() functions is not available anymore
when doing from turtle import *. So in this case this import is of limited
use. And it definitely makes sense to tell highschool students that
sin(), cos() and friends live in a module called math.


From oliphant.travis at ieee.org  Tue May 27 23:53:59 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Tue, 27 May 2008 16:53:59 -0500
Subject: [Python-3000] Single buffer implied in new buffer protocol?
In-Reply-To: <g1675f$hf4$1@ger.gmane.org>
References: <g1675f$hf4$1@ger.gmane.org>
Message-ID: <g1hvto$kh3$1@ger.gmane.org>

Stefan Behnel wrote:
> Hi,
> 
> while implementing Py_buffer support in Cython, I noticed (the hard way,
> throught a segfault), that the buffer pointer passed into getbuffer() can be
> NULL, e.g. when calling memoryview.tobytes(). According to PEP 3118 (first
> paragraph below the getbuffer() signature), this implies setting a lock on the
> memory. Funny enough, the LOCK flag wasn't even set in my case, I just get
> NULL as buffer and 285 as flags...

The memoryview implementation is not yet done.  I'm not sure if that is 
the only issue here.

> 
> Anyway, my point is that this part of the protocol actually implies setting a
> lock on the buffer *provider* rather than the buffer itself, as the buffer
> provider cannot distinguish between different buffers based on a NULL pointer

Yes, the language in the PEP could be more clear.   Obviously, if you 
haven't provided a Py_buffer structure to fill in, then you are only 
asking to lock the object's buffer from other access.

Naturally, the exporter should handle the case when no lock is actually 
requested.

> 
> I know, the protocol is overly complex already and hard to implement from a
> provider perspective, and I understand that that was preferred over putting
> the complexity into the consumer. But wouldn't it make more sense to *always*
> pass the buffer pointer, to let the provider decide what it makes of the
> flags?


Perhaps we are not understanding each other.  The Py_buffer structure 
and the buffer pointer are 2 separate things.  It is the Py_buffer 
structure than can be NULL when getbuffer is called (the buf member of 
the structure is the actual buffer pointer and it is un-defined when 
getbuffer is called and it contains the buffer pointer on successful 
return).

Thanks for your probing.

-Travis


From guido at python.org  Wed May 28 00:31:44 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 15:31:44 -0700
Subject: [Python-3000] dbm package creation
In-Reply-To: <g1huu7$hcf$1@ger.gmane.org>
References: <g1ce83$rb6$1@ger.gmane.org>
	<bbaeab100805251508i25f94balfdedc9c4033484ff@mail.gmail.com>
	<ca471dc20805270949k6e3b1ab2p49b51176a71bcb48@mail.gmail.com>
	<g1hs2t$6a5$2@ger.gmane.org>
	<ca471dc20805271359t7a23995cn399342f371d07cdc@mail.gmail.com>
	<g1hto7$dbp$1@ger.gmane.org>
	<ca471dc20805271430g147ab6f9u232b7af8ecfd8973@mail.gmail.com>
	<g1huu7$hcf$1@ger.gmane.org>
Message-ID: <ca471dc20805271531w4503749ara75d33d803e63e4d@mail.gmail.com>

On Tue, May 27, 2008 at 2:37 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Guido van Rossum schrieb:
>
>>>>>> Or is there an expected future use case where the returned value would
>>>>>> be something in a *different* package?
>>>>>
>>>>> There was in the past, with the now-defunct bsddb185 module which was
>>>>> not used by anydbm.
>>>>>
>>>>>> Returning a module object would seem the least attractive version --
>>>>>> that would require importing the module, which may not be in the
>>>>>> caller's plan at all.
>>>>>
>>>>> It may not be, but the modules are imported anyway during import of
>>>>> dbm.__init__ (which contains whichdb() now.)
>>>>
>>>> Hm, that's a regression if you ask me. Couldn't you use lazy import?
>>>
>>> There's a module attribute "error" -- supposed to be a tuple of all
>>> possible errors from the db modules; that is hard to make lazy.
>>>
>>> Of course we could solve this by making all the different db module
>>> errors subclasses of a common exception (but since most of them are
>>> defined in C modules, this is hard again.)
>>
>> OK, let's make the latter a stretch goal. :-)
>
> I just realized: since dumbdbm's "error" is just IOError, using
> "except [any]dbm.error" will always catch IOError.
>
> So the easy solution is to just derive the database error classes
> from IOError.
>
> The slightly harder solution is to declare the above a bug, create
> a new builtin (at least on C level) error class DBError and derive
> them from that.

I think deriving them all from IOError is good enough.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Wed May 28 00:34:11 2008
From: guido at python.org (Guido van Rossum)
Date: Tue, 27 May 2008 15:34:11 -0700
Subject: [Python-3000] how to deal with compatibility problems (example:
	turtle module)
In-Reply-To: <483C7FE9.6070909@aon.at>
References: <483C7399.3050103@aon.at>
	<ca471dc20805271357m570b875aja710b88b3da5e72a@mail.gmail.com>
	<483C7FE9.6070909@aon.at>
Message-ID: <ca471dc20805271534v18e53f3ei6647592e46cba182@mail.gmail.com>

In the light of that, I'm not opposed to relaxing the 100%
compatibility requirement.

On Tue, May 27, 2008 at 2:40 PM, Gregor Lingl <gregor.lingl at aon.at> wrote:
>
>
> Guido van Rossum schrieb:
>>
>> The old turtle.py explicitly says
>>
>> from math import * # Also for export
>>
>> so I think it's desirable to keep this behavior. My intent with that
>> line was that an absolute beginner could put "from turtle import *" in
>> their interactive session and be able to use both the turtle code and
>> the high-school math functions that might come in handy, like sin()
>> and cos(). The other math functions don' really hurt I believe. Where
>> there's a naming conflict, obviously the turtle module wins.
>>
>> --Guido
>>
>
> Thanks for the quick reply, I'll do it this way.
>
> Gregor
>
> P.S.: I'd just like to add one (critical) remark which results from some
> decades working as a highschool teacher and (nearly) one decade working
> with Python, and especially turtle graphics with highschool students:
>
> sin() and cos() imported from math work with radians. The default
> angle-mode for turtle is degrees. So when using trig-functions I have
> to talk about radian measure and conversion of angle units. To calculate
> the sine of 30 degrees for instance I had to call sin(radians(30)) etc.,
> but unfortunately just this radians() functions is not available anymore
> when doing from turtle import *. So in this case this import is of limited
> use. And it definitely makes sense to tell highschool students that
> sin(), cos() and friends live in a module called math.
>
>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From jimjjewett at gmail.com  Wed May 28 00:40:48 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 18:40:48 -0400
Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka String
	ABC)
In-Reply-To: <e5fff6640805271309ybe102d3y3526f0e32141df2@mail.gmail.com>
References: <loom.20080527T192243-415@post.gmane.org>
	<ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com>
	<e5fff6640805271309ybe102d3y3526f0e32141df2@mail.gmail.com>
Message-ID: <fb6fbf560805271540s9bd4eb6oaa53a3f79dfef688@mail.gmail.com>

On 5/27/08, Benji York wrote:
> Guido van Rossum wrote:
>  > Armin Ronacher wrote:

> >> Basically *the* problematic situation with iterable strings is something like
>  >> a `flatten` function that flattens out every iterable object except of strings.

> > I'm not against this, but so far I've not been able to come up with a
>  > good set of methods to endow the String ABC with. Another problem is
>  > that not everybody draws the line in the same place -- how should
>  > instances of bytes, bytearray, array.array, memoryview (buffer in 2.6)
>  > be treated?

> Maybe the opposite approach would be more fruitful.  Flattening is about
>  removing nested "containers", so perhaps there should be an ABC that
>  things like lists and tuples provide, but strings don't.  No idea what
>  that might be.

It isn't really stringiness that matters, it is that you have to
terminate even though you still have an iterable container.

The test is roughly (1==len(v) and v[0]==v), except that you want to
stop a layer sooner.

Guido had at least a start in Searchable, back when ABC were still in
the sandbox:
http://svn.python.org/view/sandbox/trunk/abc/abc.py?rev=55321&view=auto

Searchable represented the fact that (x in c) =/=> (x in iter(c))
because of sequence searches like ("Error" in results)

-jJ

From python at rcn.com  Wed May 28 00:54:20 2008
From: python at rcn.com (Raymond Hettinger)
Date: Tue, 27 May 2008 15:54:20 -0700
Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC)
References: <loom.20080527T192243-415@post.gmane.org><ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com><e5fff6640805271309ybe102d3y3526f0e32141df2@mail.gmail.com>
	<fb6fbf560805271540s9bd4eb6oaa53a3f79dfef688@mail.gmail.com>
Message-ID: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1>

 "Jim Jewett" 
> It isn't really stringiness that matters, it is that you have to
> terminate even though you still have an iterable container.

Well said.


> Guido had at least a start in Searchable, back when ABC
> were still in the sandbox:

Have to disagree here.  An object cannot know in general
whether a flattener wants to split it or not.  That is an application
dependent decision.  A better answer is be able to tell the
flattener what should be considered atomic in a given circumstance.


Raymond

From ncoghlan at gmail.com  Wed May 28 01:17:36 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 28 May 2008 09:17:36 +1000
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805271208xa425820tb1fed0e1ae1859d1@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805271208xa425820tb1fed0e1ae1859d1@mail.gmail.com>
Message-ID: <483C9690.9010601@gmail.com>

Jim Jewett wrote:
> So I suggest that he or she use str, rather than repr -- and that we
> fix containers to make this possible.

And hope that every other author of a Python container class on the 
planet does the same thing?

Recursing downwards with str() instead of repr() will break as soon as 
it encounters a container class which either doesn't resurce with str() 
or doesn't propagate a new "this is really str()" flag (depending on how 
Oleg's PEP suggests implementing this).

PEP 3138 fixes the problem without relying on third parties to do anything.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Wed May 28 01:21:31 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 28 May 2008 09:21:31 +1000
Subject: [Python-3000] UPDATED: PEP 3138- String
 representation	in	Python 3000
In-Reply-To: <483C2939.2000409@latte.ca>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>	<483B7668.1090800@gmail.com>	<20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca>
Message-ID: <483C977B.2000500@gmail.com>

Blake Winton wrote:
> Nick Coghlan wrote:
>> While it could be argued that if you want unambiguous output you 
>> should be invoking repr() on the container instead of str(), I'm still 
>> seeing many more downsides than upsides to the idea of making str() on 
>> the builtin containers display their contents with str() instead of 
>> repr().
> 
> But which downsides do you see that aren't solved by the use of repr to 
> get unambiguous output?

The fact that calling str() on containers has been unambiguous for 
years. All I'm saying is that no compelling use cases have been 
presented to justify changing the status quo (aside from the Unicode 
escaping problem, which is better addressed by allowing repr() to return 
arbitrary Unicode glyphs as proposed by PEP 3138, since that also fixes 
a bunch of other cases where repr() is invoked on Unicode strings).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Wed May 28 02:25:50 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 May 2008 12:25:50 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <483BE3D8.4080806@egenix.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805222130h6a3b63d5x59ec615da533c376@mail.gmail.com>
	<797440730805230028o94271eexf1d7a77bb5e191ea@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<483BE3D8.4080806@egenix.com>
Message-ID: <483CA68E.7040909@canterbury.ac.nz>

M.-A. Lemburg wrote:

> AFAIK, eval(repr(obj)) is no longer a requirement... simply because
> it has always only worked for a small subset of objects and in
> reality, you wouldn't want to call eval() on anything too often
> due to the security implications.

Yes, I actually think that sentence in the docs should
be removed, since it's more misleading than helpful.

A suitable replacement might be something like "str()
is intended for normal program output, and repr() is
intended for diagnostic output". Plus something about
it being desirable for repr() to make the type of the
object as unambiguous as possible.

-- 
Greg

From jimjjewett at gmail.com  Wed May 28 02:52:06 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 20:52:06 -0400
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <483C977B.2000500@gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca>
	<483C977B.2000500@gmail.com>
Message-ID: <fb6fbf560805271752n14c383aaofd0b6aef6d679dc8@mail.gmail.com>

On 5/27/08, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Blake Winton wrote:

> > But which downsides do you see that aren't solved by the use of repr to
> get unambiguous output?

>  The fact that calling str() on containers has been unambiguous for years.
> All I'm saying is that no compelling use cases have been presented to
> justify changing the status quo (aside from the Unicode escaping problem,
> which is better addressed by allowing repr() to return arbitrary Unicode
> glyphs as proposed by PEP 3138, since that also fixes a bunch of other cases
> where repr() is invoked on Unicode strings).

I think it is pretty clear that there are sometimes reasons to want
more than one string representation.  There are arguably far more than
two distinctions that would be useful, but two is what the language
supports.  That was the justification for str vs repr in the first
place.

What is the advantage in continuing to conflate the two for (only
portions of) containers?

The only justfication that I can see is "backwards compatibility",
which applies even more strongly to repr than it does to str.

-jJ

From greg.ewing at canterbury.ac.nz  Wed May 28 03:09:19 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 May 2008 13:09:19 +1200
Subject: [Python-3000] UPDATED: PEP 3138- String
 representation	in	Python 3000
In-Reply-To: <483C2939.2000409@latte.ca>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca>
Message-ID: <483CB0BF.2060505@canterbury.ac.nz>

Blake Winton wrote:

> Seriously, I can write:
>  >>> print 1, "1", Decimal("1")
> and get as my output:
> 1 1 1

Yes, but you've explicitly told it to print that,
so presumably it's what you want in that case.

Equally, you need to be explicit about how you want
a list printed.

-- 
Greg

From jimjjewett at gmail.com  Wed May 28 03:44:54 2008
From: jimjjewett at gmail.com (Jim Jewett)
Date: Tue, 27 May 2008 21:44:54 -0400
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <483CB0BF.2060505@canterbury.ac.nz>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca>
	<483CB0BF.2060505@canterbury.ac.nz>
Message-ID: <fb6fbf560805271844g772de6f8wa6c5d28be4b3ca8f@mail.gmail.com>

On 5/27/08, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Blake Winton wrote:

> > Seriously, I can write:
> >  >>> print 1, "1", Decimal("1")
> > and get as my output:
> > 1 1 1

>  Yes, but you've explicitly told it to print that,
>  so presumably it's what you want in that case.

>  Equally, you need to be explicit about how you want
>  a list printed.

Agreed; and that is why I consider the current behavior a bug.

If you want the type information in there, then you should use repr
instead of str.  For example, to get get the type information for [1,
"1", Decimal("1")], writing:

    print repr([1, "1", Decimal("1")])

is not such a huge problem.  On the other hand, if you do not care
about the specific types, and want to declutter the output, then
writing:

    print ("[" + ", ".join(str(e for e in [1, "1", Decimal("1")])) + "]")

is a bit more awkward in the best case -- and fails if your data
structures are not all nested to exactly the same depth.  Suddenly,
you need to rewrite the equivalent of the pprint module.

Again, what is the advantage of having str(x) be redundant to repr(x)
in the case of containers?

-jJ

From greg.ewing at canterbury.ac.nz  Wed May 28 03:45:00 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 May 2008 13:45:00 +1200
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805271233i5412e271sea69f9e80b935800@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<483BE3D8.4080806@egenix.com>
	<fb6fbf560805271233i5412e271sea69f9e80b935800@mail.gmail.com>
Message-ID: <483CB91C.9090900@canterbury.ac.nz>

Jim Jewett wrote:

> PEP 3138 says that repr should start printing unicode glyphs.
> 
> I say that repr should (insetad) start recognizing when it was called
> in place of __str__, and should revert back to __str__ when it
> recurses down to the next level.

But we *don't* all agree that the only case where we want
unicode glyphs is when we call str().

I can understand a Japanese user wanting to see his text in
Japanese when he uses repr() explicitly to debug his program.

-- 
Greg

From greg.ewing at canterbury.ac.nz  Wed May 28 04:07:58 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 May 2008 14:07:58 +1200
Subject: [Python-3000] Single buffer implied in new buffer protocol?
In-Reply-To: <g1hvto$kh3$1@ger.gmane.org>
References: <g1675f$hf4$1@ger.gmane.org> <g1hvto$kh3$1@ger.gmane.org>
Message-ID: <483CBE7E.9080902@canterbury.ac.nz>

Travis Oliphant wrote:
> Obviously, if you 
> haven't provided a Py_buffer structure to fill in, then you are only 
> asking to lock the object's buffer from other access.

What's the use case for that? Why would you ever want
to lock an object if you don't intend to access it?

BTW, I seem to remember when the PEP was being discussed
that there was talk of putting some intelligence into the
PyObject_* layer to make things easier for both the
user and the provider, such as filling in some members
of the Py_buffer if the provider didn't do it. Did
anything come of that?

-- 
Greg

From greg.ewing at canterbury.ac.nz  Wed May 28 04:41:55 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 28 May 2008 14:41:55 +1200
Subject: [Python-3000] UPDATED: PEP 3138- String representation in
 Python 3000
In-Reply-To: <fb6fbf560805271844g772de6f8wa6c5d28be4b3ca8f@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<483B7668.1090800@gmail.com> <20080527062947.GA14808@phd.pp.ru>
	<483BC2B3.6040308@gmail.com> <483C2939.2000409@latte.ca>
	<483CB0BF.2060505@canterbury.ac.nz>
	<fb6fbf560805271844g772de6f8wa6c5d28be4b3ca8f@mail.gmail.com>
Message-ID: <483CC673.4020804@canterbury.ac.nz>

Jim Jewett wrote:
> Again, what is the advantage of having str(x) be redundant to repr(x)
> in the case of containers?

I think you're misrepresenting the situation when you
describe it that way.

Guido didn't sit down and think "I know, let's make
str(lst) do the same as repr(lst)." He thought
"It's not clear what str(lst) should do, so let's
not define it at all."

There can't be a bug in list.__str__, because
list.__str__ *does not exist*. If you want one, you
have to decide what you want it to do and write it
yourself.

I've never found this to be a problem. Either
repr(lst) is good enough, or I've wanted something
completely different and had to write my own code
for it anyway.

I've *never* wanted anything that was just like
repr(lst) except that it didn't quote the strings.
That would only be confusing.

-- 
Greg

From carl at carlsensei.com  Wed May 28 05:48:41 2008
From: carl at carlsensei.com (Carl Johnson)
Date: Tue, 27 May 2008 17:48:41 -1000
Subject: [Python-3000] Proposal to add __str__ method to iterables.
Message-ID: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com>

Proposal to add __str__ method to iterables:

Proposed behavior of the __str__ method for iterables is that it  
returns the result of "".join(str(i) for i in self).

Justification:

Notice this difference in the behavior of filter* and a list  
comprehension:

 >>> filter(lambda c: c!="a", "abracadbra")
'brcdbr'
 >>> [c for c in "abracadbra" if c != "a"]
['b', 'r', 'c', 'd', 'b', 'r']

*This is the pre-3.0 filter's behavior. Post-3.0, "filter" is really  
ifilter.

In order to replicate the behavior of filter with a comprehension, the  
return type must be the same as the input type:

 >>> def my_filter(cond, it):
...     return type(it)(i for i in it if cond(i))

Thus, we get the same results using the old style filter and my_filter:

 >>> filter(lambda c: c!="a", (1,2))
(1, 2)
 >>> my_filter(lambda c: c!="a", (1,2))
(1, 2)
 >>> filter(lambda c: c!="a", [1,2])
[1, 2]
 >>> my_filter(lambda c: c!="a", [1,2])
[1, 2]

But not in every case!

 >>> filter(lambda c: c!="a", "abracadbra")
'brcdbr'
 >>> my_filter(lambda c: c!="a", "abracadbra")
'<generator object at 0x27c2300>'

Why does my_filter return a string saying "<generator object at  
blah>"? Because generator objects have no __str__ method, so  
str(gen_obj) returns gen_obj.__repr__().

So, my proposal is to make strings act like the other members of the  
iterable family by adding an __str__ method to them, which does a  
"".join on the str of its members.

- - - -

Potential downside #1: Don't try to print an infinite object, like  
itertools.count().

Other potential downside #2: This makes "".join(l) obsolete.

Regarding #1: Do a repr instead.

Regarding #2: I don't consider that to be a bad thing actually. I  
think doing "".join is very unnatural for people new to Python, and I  
think that even as people who are used to Python, I think we should  
admit that it's a little weird to join list members in that way.

In terms of actual implementation, this could also be done by having  
the str class look for a __str__ method then a __iter__ method and  
only then use __repr__ as the final fallback instead of falling back  
to __repr__ as is done now. That might be easier than adding __str__  
methods to all iterables.

- - - -

Incidentally, I think the idea that str(["1", "2"]) should return "[1,  
2]" is a terrible idea. Where's the use case for that? When would you  
ever need to print that? It should return "12", which actually does  
have a use case as the replacement for "".join(["1", "2"]).

From mal at egenix.com  Wed May 28 12:12:27 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 28 May 2008 12:12:27 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483BDE11.509@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
Message-ID: <483D300B.5090309@egenix.com>

I'm beginning to wonder whether I'm the only one who cares about
the Python 2.x branch not getting cluttered up with artifacts caused
by a broken forward merge strategy.

How can it be that we allow major C API changes such as the renaming
of the PyString APIs to go into the trunk without discussion or
a PEP ?

We're having lengthy discussions about the addition of single method
to an object, but such major changes just go in like that and nobody
seems to really care.

Puzzled,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 28 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            39 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611


On 2008-05-27 12:10, M.-A. Lemburg wrote:
> On 2008-05-26 23:34, Christian Heimes wrote:
>> M.-A. Lemburg schrieb:
>>> Isn't that an awefuly confusing approach ?
>>>
>>> Wouldn't it be better to keep PyString APIs and definitions in
>>> stringobject.c|h
>>>
>>> and only add a new bytesobject.h header file that #defines the
>>> PyBytes APIs in terms of PyString APIs ? That maintains
>>> backwards compatibility and allows Python internals to use the
>>> new API names.
>>>
>>> With your approach, you've basically backported the confusing
>>> notion in Py3k that str() maps PyUnicode, only that in Py2
>>> str() will now map to PyBytes.
>>
>> The last time I brought up the topic, I had a lengthy discussion with
>> Guido. At first I wanted to rename the API in Python 3.0 only. Guido
>> argued that it's going to cause too much merge conflicts. He then
>> suggested the approach I implemented today.
> 
> That's the same argument that came up in the module renaming
> discussion.
> 
> I have a feeling that we should be looking for better merge
> tools, rather than implement code changes that cause more trouble
> than do good, just because our existing tools aren't smart
> enough.
> 
> Wouldn't it be possible to have a 2to3.py converter
> take the 2.x code (including the C code), convert it and then
> apply any changes to the 3.x branch ?
> 
> This wouldn't be merging in the classical sense, it would be
> automated forward porting.
> 
>> I find the approach less confusing than your suggestion and my initial
>> idea.
> 
> I disagree on that.
> 
> Renaming old APIs to use the new names by adding a header file with
> #define <oldname> <newname> is standard practice.
> 
> Renaming the old APIs in the source code and undoing the renaming
> with a header file is not.
> 
>> The internal API names are consistent for Python 2.6 and 3.0. The
>> byte string C API is prefixed PyBytes and the unicode C API is prefixed
>> PyUnicode. A core developer has just to remember that 'str' is a byte
>> string in 2.x but an unicode object in 3.0.
> 
> So you've solved part of the problem for 3.x by moving the naming mixup
> back to 2.x.
> 
>> Extension developers don't have to worry at all. The ABI and external
>> API is mostly the same and still exposes the 'str' functions as PyString.
> 
> Well, yes, but only due to a preprocessor hack that turns the
> names used in bytesobject.c back into names you'd normally look
> for in stringobject.c.
> 
> And all this, just because Subversion can't handle merging of
> symbol renaming.
> 
>>> You'd have to add an aliase bytes -> str to the builtins to
>>> at least reduce the confusion a bit.
>>
>> Python 2.6 already has an alias bytes -> str
>>
>>> Yes, but please let's first discuss this some more. I don't think
>>> that the timing was right.... you started this thread just yesterday
>>> and the patches are already checked in.
>>
>> I'm sorry if I was too hasty for you. I got +1 from a couple of
>> developers and it's basically Guido's suggestion.
> 
> Please discuss any changes of the 2.x code base on python-dev.
> 
> Such major changes do need more discussion and possibly a PEP as well.
> 
> Thanks,


From phd at phd.pp.ru  Wed May 28 13:55:15 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Wed, 28 May 2008 15:55:15 +0400
Subject: [Python-3000] str(containter) calls repr(item)
In-Reply-To: <20080527201450.GA29645@phd.pp.ru>
References: <797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<483BE3D8.4080806@egenix.com>
	<fb6fbf560805271233i5412e271sea69f9e80b935800@mail.gmail.com>
	<5d44f72f0805271304i5875fe0bn763c977e5e63cb5f@mail.gmail.com>
	<20080527201450.GA29645@phd.pp.ru>
Message-ID: <20080528115515.GA14748@phd.pp.ru>

On Wed, May 28, 2008 at 12:14:50AM +0400, Oleg Broytmann wrote:
>    I have wrote the PEP.

   I'm discussing the PEP with Jim Jewett - more motivation and better
wording - so the PEP will be published a bit later.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From musiccomposition at gmail.com  Wed May 28 14:00:11 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Wed, 28 May 2008 07:00:11 -0500
Subject: [Python-3000] Proposal to add __str__ method to iterables.
In-Reply-To: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com>
References: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com>
Message-ID: <1afaf6160805280500we89354dv1e2308b11cc2d42e@mail.gmail.com>

On Tue, May 27, 2008 at 10:48 PM, Carl Johnson <carl at carlsensei.com> wrote:
> - - - -
>
> Potential downside #1: Don't try to print an infinite object, like
> itertools.count().
>
> Other potential downside #2: This makes "".join(l) obsolete.

No, it wouldn't. What is people want to join sequences with something
other than a whitespace or whatever you propose.
>
> Regarding #1: Do a repr instead.
>
> Regarding #2: I don't consider that to be a bad thing actually. I think
> doing "".join is very unnatural for people new to Python, and I think that
> even as people who are used to Python, I think we should admit that it's a
> little weird to join list members in that way.

It's good to have join on string object because then any iterable can
be joined. It doesn't require the sequence to implement it.
>
> In terms of actual implementation, this could also be done by having the str
> class look for a __str__ method then a __iter__ method and only then use
> __repr__ as the final fallback instead of falling back to __repr__ as is
> done now. That might be easier than adding __str__ methods to all iterables.
>
> - - - -
>
> Incidentally, I think the idea that str(["1", "2"]) should return "[1, 2]"
> is a terrible idea. Where's the use case for that? When would you ever need
> to print that? It should return "12", which actually does have a use case as
> the replacement for "".join(["1", "2"]).

However, it's not expected.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/musiccomposition%40gmail.com
>


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From lists at cheimes.de  Wed May 28 14:02:53 2008
From: lists at cheimes.de (Christian Heimes)
Date: Wed, 28 May 2008 14:02:53 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483BDE11.509@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
Message-ID: <483D49ED.8060907@cheimes.de>

M.-A. Lemburg schrieb:
> I have a feeling that we should be looking for better merge
> tools, rather than implement code changes that cause more trouble
> than do good, just because our existing tools aren't smart
> enough.

We don't have better tools at our hands. I don't think we'll get any
tools in time or chance the VCS right before a major release.

> Wouldn't it be possible to have a 2to3.py converter
> take the 2.x code (including the C code), convert it and then
> apply any changes to the 3.x branch ?

Such a converter would be nice for 3rd party code but it's not an option
for the core. In the past few months I've merged a lot of code from
trunk to py3k. A 2to3 C converter doesn't help with merge conflicts.
Naming differences make any merge more painful

>> I find the approach less confusing than your suggestion and my initial
>> idea.
> 
> I disagree on that.
> 
> Renaming old APIs to use the new names by adding a header file with
> #define <oldname> <newname> is standard practice.
> 
> Renaming the old APIs in the source code and undoing the renaming
> with a header file is not.

I wasn't talking about standard practice here. I talked about less
confusion for core developers. My approach doesn't split our internal
API in two.
And by the way it *is* a standard approach fore Python. Guido told me
that the same approach was used during the 1.x to 2.0 migration.

> And all this, just because Subversion can't handle merging of
> symbol renaming.

As I said earlier we don't have better tools at our disposal. We have to
make some compromises. Sometimes practicality beat purity.

> Please discuss any changes of the 2.x code base on python-dev.
> 
> Such major changes do need more discussion and possibly a PEP as well.

In the last few months I started at least three topics about the C API
renaming. It's in the thread "2.6 and 3.0 tasks"
http://permalink.gmane.org/gmane.comp.python.devel/93016

Christian

From p.f.moore at gmail.com  Wed May 28 14:29:32 2008
From: p.f.moore at gmail.com (Paul Moore)
Date: Wed, 28 May 2008 13:29:32 +0100
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483D300B.5090309@egenix.com>
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
	<483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
	<483D300B.5090309@egenix.com>
Message-ID: <79990c6b0805280529vefcb2a6l200afda6222503f8@mail.gmail.com>

On 28/05/2008, M.-A. Lemburg <mal at egenix.com> wrote:
> I'm beginning to wonder whether I'm the only one who cares about
> the Python 2.x branch not getting cluttered up with artifacts caused
> by a broken forward merge strategy.

I care, but I struggle to understand the implications and/or what is
being proposed in many cases.

Recent examples are the ABC backports and the current thread (string C
API). I simply don't follow the issues well enough to comment.

> How can it be that we allow major C API changes such as the renaming
> of the PyString APIs to go into the trunk without discussion or
> a PEP ?

Christian has raised this a couple of times, but there has been little
discussion. I suspect that this is because there is not enough clarity
over the practical consequences. A PEP may help here, but I'm not sure
how much - it could spark discussion, but would anyone actually end up
any better informed?

> We're having lengthy discussions about the addition of single method
> to an object, but such major changes just go in like that and nobody
> seems to really care.

I suspect deadline pressure and burnout are involved here.

In all honesty, there's been little or no work done on the C API,
which is just as much in need of review and possible cleanup for 3.0
as the language. It's as close as makes no difference to too late now
- does that mean we've lost the chance?

Paul.

From mal at egenix.com  Wed May 28 14:59:33 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 28 May 2008 14:59:33 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <79990c6b0805280529vefcb2a6l200afda6222503f8@mail.gmail.com>
References: <48397ECC.9070805@cheimes.de>
	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>
	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>
	<483BDE11.509@egenix.com>	<483D300B.5090309@egenix.com>
	<79990c6b0805280529vefcb2a6l200afda6222503f8@mail.gmail.com>
Message-ID: <483D5735.4090608@egenix.com>

On 2008-05-28 14:29, Paul Moore wrote:
> On 28/05/2008, M.-A. Lemburg <mal at egenix.com> wrote:
>> I'm beginning to wonder whether I'm the only one who cares about
>> the Python 2.x branch not getting cluttered up with artifacts caused
>> by a broken forward merge strategy.
> 
> I care, but I struggle to understand the implications and/or what is
> being proposed in many cases.

Thanks, so I'm not the only :-)

> Recent examples are the ABC backports and the current thread (string C
> API). I simply don't follow the issues well enough to comment.
> 
>> How can it be that we allow major C API changes such as the renaming
>> of the PyString APIs to go into the trunk without discussion or
>> a PEP ?
> 
> Christian has raised this a couple of times, but there has been little
> discussion. I suspect that this is because there is not enough clarity
> over the practical consequences. A PEP may help here, but I'm not sure
> how much - it could spark discussion, but would anyone actually end up
> any better informed?

Probably, yes.

The reason is that if you have a PEP, more people are likely to
review it and make comments.

If you start a discussion with a general subject line which then
results in lots of little sub-threads, important aspects of the
discussion are likely to go unnoticed in the noise.

>> We're having lengthy discussions about the addition of single method
>> to an object, but such major changes just go in like that and nobody
>> seems to really care.
> 
> I suspect deadline pressure and burnout are involved here.
> 
> In all honesty, there's been little or no work done on the C API,
> which is just as much in need of review and possible cleanup for 3.0
> as the language. It's as close as makes no difference to too late now
> - does that mean we've lost the chance?

Perhaps, but the C API is certainly not used by as many people
as the Python front-end and changes to the C API also have much
deeper consequences due the API being written in C rather than
Python.

Overall, I don't think there's a lot to cleanup in the C API.
Perhaps remove a few of those '...Ex()' APIs that were introduced
to extend the original APIs and maybe remove or free up a few
type slots that are no longer needed, but that's about it.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 28 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            39 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Wed May 28 14:47:00 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Wed, 28 May 2008 14:47:00 +0200
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
 (Stabilizing the C API of 2.6 and 3.0)
In-Reply-To: <483D49ED.8060907@cheimes.de>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>
	<483BDE11.509@egenix.com> <483D49ED.8060907@cheimes.de>
Message-ID: <483D5444.8000705@egenix.com>

On 2008-05-28 14:02, Christian Heimes wrote:
> M.-A. Lemburg schrieb:
>> I have a feeling that we should be looking for better merge
>> tools, rather than implement code changes that cause more trouble
>> than do good, just because our existing tools aren't smart
>> enough.
> 
> We don't have better tools at our hands. I don't think we'll get any
> tools in time or chance the VCS right before a major release.
> 
>> Wouldn't it be possible to have a 2to3.py converter
>> take the 2.x code (including the C code), convert it and then
>> apply any changes to the 3.x branch ?
> 
> Such a converter would be nice for 3rd party code but it's not an option
> for the core. In the past few months I've merged a lot of code from
> trunk to py3k. A 2to3 C converter doesn't help with merge conflicts.
> Naming differences make any merge more painful

I was suggesting to not use SVN to merge changes directly, but to
instead use an intermediate step in the process:

Init:

  1. grab the latest trunk

  2. apply a 2to3 converter to the Python code and the C code,
     applying any renaming that may be necessary

  3. save this converted version in a separate branch merge-branch

Update:

  1. checkout the merge-branch,
   . grab the latest trunk and 3.x branch

  2. apply a 2to3 converter to the Python code and the C code,
     applying any renaming that may be necessary

  3. copy the files over your working copy of the merge-branch

  4. create a diff on the merge-branch

  5. apply the diffs to 3.x branch, resolving any conflicts
     as necessary

This doesn't require new tools (except for some C renaming
support in the 2to3 tool). It only changes the procedure.

We'd basically follow our own suggestions w/r to porting to 3.x,
which is to make changes in the 2.x code, apply 2to3 and then
apply remaining fixes there.

I'm suggesting this, since 3.x is likely to introduce more
Python stdlib and C API changes. The process would likely also
makes a lot of other changes more easily manageable and reduce
the overall merge conflicts.

>>> I find the approach less confusing than your suggestion and my initial
>>> idea.
>> I disagree on that.
>>
>> Renaming old APIs to use the new names by adding a header file with
>> #define <oldname> <newname> is standard practice.
>>
>> Renaming the old APIs in the source code and undoing the renaming
>> with a header file is not.
> 
> I wasn't talking about standard practice here. I talked about less
> confusion for core developers. My approach doesn't split our internal
> API in two.

No, but it does apply a well hidden renaming which will cause
confusion when using a debugger to trace calls in C code.

If you use PyBytes APIs, you expect to find PyBytes functions in
the libs and also set breakpoints on these.

With the renaming we don't have two sets of APIs (old and new) exposed
in the lib, like what we normally do when applying changes to API names.

> And by the way it *is* a standard approach fore Python. Guido told me
> that the same approach was used during the 1.x to 2.0 migration.

There was no API change between 1.6 and 2.0.

You are probably talking about the great renaming between 1.4 and 1.5.
That was different, since it changes almost all C APIs in Python.
And it used the standard practice... from rename2.h in Python 1.5:

/* This file contains a bunch of #defines that make it possible to use
    "old style" names (e.g. object) with the new style Python source
    distribution. */

#define True Py_True
#define False Py_False
#define None Py_None

ie. #define <oldname> <newname>

>> And all this, just because Subversion can't handle merging of
>> symbol renaming.
> 
> As I said earlier we don't have better tools at our disposal. We have to
> make some compromises. Sometimes practicality beat purity.

See above.

>> Please discuss any changes of the 2.x code base on python-dev.
>>
>> Such major changes do need more discussion and possibly a PEP as well.
> 
> In the last few months I started at least three topics about the C API
> renaming. It's in the thread "2.6 and 3.0 tasks"
> http://permalink.gmane.org/gmane.comp.python.devel/93016

Thanks. I stopped reading that thread after Guido's reply in

http://comments.gmane.org/gmane.comp.python.devel/92541

It would really help if subject lines were more specific.

This thread also uses a much to general subject line (which is
why I changed it).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 28 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            39 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From ncoghlan at gmail.com  Wed May 28 15:43:13 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 28 May 2008 23:43:13 +1000
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
 (Stabilizing the C API of 2.6 and 3.0)
In-Reply-To: <483D5444.8000705@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>
	<483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com>
Message-ID: <483D6171.5000208@gmail.com>

M.-A. Lemburg wrote:
> You are probably talking about the great renaming between 1.4 and 1.5.
> That was different, since it changes almost all C APIs in Python.
> And it used the standard practice... from rename2.h in Python 1.5:
> 
> /* This file contains a bunch of #defines that make it possible to use
>    "old style" names (e.g. object) with the new style Python source
>    distribution. */
> 
> #define True Py_True
> #define False Py_False
> #define None Py_None
> 
> ie. #define <oldname> <newname>

This is what I expected to see in stringobject.h, along with some code 
in stringobject.c to allow the linker to see the old names *as well as* 
the new names.

At the moment, all the code appears to be using the new names, but 
stringobject.h implicitly converts the new names back to the old names - 
so trying to use ctypes to retrieve the PyBytes_* functions from the 
Python DLL will fail.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From dalcinl at gmail.com  Wed May 28 16:35:06 2008
From: dalcinl at gmail.com (Lisandro Dalcin)
Date: Wed, 28 May 2008 11:35:06 -0300
Subject: [Python-3000] Single buffer implied in new buffer protocol?
In-Reply-To: <483CBE7E.9080902@canterbury.ac.nz>
References: <g1675f$hf4$1@ger.gmane.org> <g1hvto$kh3$1@ger.gmane.org>
	<483CBE7E.9080902@canterbury.ac.nz>
Message-ID: <e7ba66e40805280735g291eb150k90eef21afcdb8f1@mail.gmail.com>

On 5/27/08, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
> Travis Oliphant wrote:
>
> > Obviously, if you haven't provided a Py_buffer structure to fill in, then
> you are only asking to lock the object's buffer from other access.
> >
>
>  What's the use case for that? Why would you ever want
>  to lock an object if you don't intend to access it?
>

Well, iff we already accessed the object, had stored the raw memory
pointer, and hold a reference to it, and now we want other thread to
operate on the raw memory, does not make sense to just lock the
object?

In the context of MPI communication, I believe I have a use case,
using something called persistent communication requests. You emit a
Comm.Recv_init() call with the pointer to the buffer receiving the
message (then you have to ask the object for the buffer pointer). The
Comm.Recv_init() returns a 'Prequest' instance (persistent request),
but the actual communication does not initiate until you call
Prequest.Start(). Then when you initiate the communication, we should
lock the object until the communication finalizes, because then we
could release the GIL but protect the raw memory from other accesses.


-- 
Lisandro Dalc?n
---------------
Centro Internacional de M?todos Computacionales en Ingenier?a (CIMEC)
Instituto de Desarrollo Tecnol?gico para la Industria Qu?mica (INTEC)
Consejo Nacional de Investigaciones Cient?ficas y T?cnicas (CONICET)
PTLC - G?emes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594

From janssen at parc.com  Wed May 28 19:08:23 2008
From: janssen at parc.com (Bill Janssen)
Date: Wed, 28 May 2008 10:08:23 PDT
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483D300B.5090309@egenix.com> 
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
	<483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
	<483D300B.5090309@egenix.com>
Message-ID: <08May28.100829pdt."58698"@synergy1.parc.xerox.com>

> I'm beginning to wonder whether I'm the only one who cares about
> the Python 2.x branch not getting cluttered up with artifacts caused
> by a broken forward merge strategy.

I share your concern.  Seems to me that perhaps (not sure, but
perhaps) the rush to back-port from 3.x, and the concern about
minimizing pain of moving from 2.x to 3.x, has become the tail wagging
the dog.

Bill

From brett at python.org  Wed May 28 21:40:37 2008
From: brett at python.org (Brett Cannon)
Date: Wed, 28 May 2008 12:40:37 -0700
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <8453256766467481803@unknownmsgid>
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
	<483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
	<483D300B.5090309@egenix.com> <8453256766467481803@unknownmsgid>
Message-ID: <bbaeab100805281240n76e0aa9escc5b26307d581ecd@mail.gmail.com>

On Wed, May 28, 2008 at 10:08 AM, Bill Janssen <janssen at parc.com> wrote:
>> I'm beginning to wonder whether I'm the only one who cares about
>> the Python 2.x branch not getting cluttered up with artifacts caused
>> by a broken forward merge strategy.
>
> I share your concern.  Seems to me that perhaps (not sure, but
> perhaps) the rush to back-port from 3.x, and the concern about
> minimizing pain of moving from 2.x to 3.x, has become the tail wagging
> the dog.
>

Speaking for myself, I know that if fixing something in 2.x means a
pain in forward-porting, I will just do it in 3.x and leave it someone
else to back-port to 2.x which will lower the chances of the back-port
ever occurring. I don't want to do this, but I am fighting damn hard
against burn-out at this point and if I have to choose between
complete burn-out and only working on the leading edge version of
Python, I will choose the latter. So I for one appreciate Christian
taking all of us into account in terms of the approach taken to make
our lives easier when we work on Python.

-Brett

From greg at krypto.org  Wed May 28 22:47:29 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Wed, 28 May 2008 13:47:29 -0700
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483D300B.5090309@egenix.com>
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
	<483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
	<483D300B.5090309@egenix.com>
Message-ID: <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>

On Wed, May 28, 2008 at 3:12 AM, M.-A. Lemburg <mal at egenix.com> wrote:
> I'm beginning to wonder whether I'm the only one who cares about
> the Python 2.x branch not getting cluttered up with artifacts caused
> by a broken forward merge strategy.
>
> How can it be that we allow major C API changes such as the renaming
> of the PyString APIs to go into the trunk without discussion or
> a PEP ?

I do not consider it a C API change.  The API and ABI have not
changed.  Old code still compiles.  Old binaries still dynamically
load and work fine.  (I just confirmed this by importing a couple
python2.4 .so files into my non-debug build of 2.6 trunk)

A of the PyString APIs are the real implementations in 2.x and are
still there.  We only switched to using their PyBytes equivalent names
within the Python trunk code base.

Are you objecting to our own code switching to use a different name
even though the actual underlying API and ABI haven't changed?  I
suppose to people reading the code and going against old reference
books it could be confusing but they've got to get used to the new
names somehow and sometime.

I strongly support changes like this one that makes the life of
porting C code forwards and backwards between 2.x and 3.x easier
without breaking compatibility with earlier 2.x version because that
is going to be a serious pain for all of us otherwise.

-gps

From greg.ewing at canterbury.ac.nz  Thu May 29 02:31:52 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 29 May 2008 12:31:52 +1200
Subject: [Python-3000] Single buffer implied in new buffer protocol?
In-Reply-To: <e7ba66e40805280735g291eb150k90eef21afcdb8f1@mail.gmail.com>
References: <g1675f$hf4$1@ger.gmane.org> <g1hvto$kh3$1@ger.gmane.org>
	<483CBE7E.9080902@canterbury.ac.nz>
	<e7ba66e40805280735g291eb150k90eef21afcdb8f1@mail.gmail.com>
Message-ID: <483DF978.70203@canterbury.ac.nz>

Lisandro Dalcin wrote:
> You emit a
> Comm.Recv_init() call with the pointer to the buffer receiving the
> message (then you have to ask the object for the buffer pointer).
> ... Then when you initiate the communication, we should
> lock the object

No, you can't rely on a buffer pointer returned earlier
if the object may have been unlocked in the meantime.

The right thing to do in this case is just keep a
reference to the object whose buffer you're going to
be storing the result in. Then when it comes time to
start the receive, you obtain the buffer pointer and
lock the object at the same time.

-- 
Greg

From allyourcode at gmail.com  Thu May 29 03:23:51 2008
From: allyourcode at gmail.com (Daniel Wong)
Date: Wed, 28 May 2008 18:23:51 -0700
Subject: [Python-3000] suggestion: structured assignment
Message-ID: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>

Hi,

Are there plans for introducing syntax like this:

(a, (b[2], c)) = ('big' ('red', 'dog'))

It seems quite doable, because Professor Hillfinger at UC Berkeley
created pyth, a dialect of Python, which has this feature. See page 10
of the spec he created for his students to implement the language:

http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf

Of course, this idea could also be applied to 'for' constructs (loops,
list comprehensions, and generators) where assignments are implicit.

Parallel looping (esp using zip) is a great use case for this. Here's
a case that's come up more than once for me that "structured"
assignments would solve really nicely:

for n, (a, b) in enumerate(list_of_pairs): ...

Currently, I must do the following instead:

for n, pair in enumerate(list_of_pairs):
  a, b = pair
  ...

This isn't such a great solution, because there's more indirection
with the introduction of an otherwise useless variable; and (less
significantly) there's an extra line of code that doesn't actually
compute anything.

Thoughts?

Daniel

PS: Sorry if this has already been discussed; I'm new to this list and
I didn't see this mentioned in PEP 3099, unless it's covered under the
LL(1) clause.

From musiccomposition at gmail.com  Thu May 29 03:26:12 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Wed, 28 May 2008 20:26:12 -0500
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
Message-ID: <1afaf6160805281826t5993094ck2a6c179a31c1e91@mail.gmail.com>

Hi Daniel,

At the moment, we are preparing to ship betas, so this kind of
proposal is a little late for 2.6/3.0. Also, I would recommend to try
this on the python-ideas mailing list first.


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From mike.klaas at gmail.com  Thu May 29 03:29:54 2008
From: mike.klaas at gmail.com (Mike Klaas)
Date: Wed, 28 May 2008 18:29:54 -0700
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
Message-ID: <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com>


On 28-May-08, at 6:23 PM, Daniel Wong wrote:

> Currently, I must do the following instead:
>
> for n, pair in enumerate(list_of_pairs):
>  a, b = pair
>  ...
>
> <>
> Thoughts?

I find it hard to believe that you have even attempted this, which has  
been valid in python for ages:

 >>> for x, (a, b) in enumerate([(1,2), (3,4), (5,6)]):
             print x, a, b

0 1 2
1 3 4
2 5 6

-Mike

From allyourcode at gmail.com  Thu May 29 06:34:14 2008
From: allyourcode at gmail.com (allyourcode at gmail.com)
Date: Wed, 28 May 2008 21:34:14 -0700
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com>
Message-ID: <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com>

Well, I'm sorry for bothering his majesty with such a stupid idea. At
least one other person didn't know about it either...

On 5/28/08, Mike Klaas <mike.klaas at gmail.com> wrote:
>
> On 28-May-08, at 6:23 PM, Daniel Wong wrote:
>
>> Currently, I must do the following instead:
>>
>> for n, pair in enumerate(list_of_pairs):
>>  a, b = pair
>>  ...
>>
>> <>
>> Thoughts?
>
> I find it hard to believe that you have even attempted this, which has
> been valid in python for ages:
>
>  >>> for x, (a, b) in enumerate([(1,2), (3,4), (5,6)]):
>              print x, a, b
>
> 0 1 2
> 1 3 4
> 2 5 6
>
> -Mike
>

From brett at python.org  Thu May 29 06:38:05 2008
From: brett at python.org (Brett Cannon)
Date: Wed, 28 May 2008 21:38:05 -0700
Subject: [Python-3000] Finishing up PEP 3108
Message-ID: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>

The issues related to PEP 3108 now total 14. With the beta
(supposedly) in a week, I am hoping the last minor details can be
pulled together or decisions made on what can be postponed and what
should definitely be considered a release blocker.

Issue 2847 - the aifc module still imports the cl module in 3.0.
Problem is that the cl module is gone. =) So it seems silly to have
the imports lying about. This can probably be changed to critical.

Issue 2848 - mimetools has been deprecated for a while, but it is
still used in a bunch of places. Since this has been deprecated in PEP
4 for a long time, should we add the removal warning in 2.6 now and
then make its actual removal of usage something to do by another beta?

Issue 2849 - rfc822 is the same problem as mimetools.

Issue 2854 - gestalt needs to be added back into 3.0. This is
Benjamin's issue. =)

Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then
again, pydoc is busted thanks to the new doc format.

Issue 2874 - the stat module is not so useful anymore, but it still
has functions that are useful. Currently the value returned by
os.stat() is a named tuple, but that won't support methods. So the
object returned by os.stat() needs to probably become a proper object
in posix.

Issue 2876 - The UserDict module has been removed in 3.0, but two
classes were moved and renamed in collections and another was removed.
The removal is a 3.0 warning, but the class renaming might be a tricky
2to3 fixer (not sure if the fix_imports fixer can be tweaked to handle
this).

Issue 2877 - UserString.UserString moved. Just need to apply the patch.

Issue 2878 - Ditto for UserList.

Issue 2885 - Creation of the urllib package. Jeremy has been working
on this. I believe his patch is up on rietveld.

Issue 2917 - This is merging pickle and cPickle. Alexandre's thing.

Issue 2918 - Same for StringIO/cStringIO.

Issue 2919 - profile and cProfile needs to be merged. This has not
been dealt with yet. Would it be reasonable to deprecate importing
cProfile directly in 2.6 with the assumption the merge will work out
for 3.0?

So that is everything that's left. Issue 2775 is the tracking issue so
you can look there to see what issues are still open and need work. I
was hoping to spend Monday and Tuesday trying to tie up as many loose
ends as possible, but the conference paper I have been working on that
was due Sunday is now due a week later, and so Monday and Tuesday will
be spent on that (supervisor's orders). Plus I am flying out Wednesday
for 10 days to help my mother move and I don't know when I will get
Net again. In other words, I still need help. =)

-Brett

P.S.: A huge thanks goes to everyone who has helped so far. My life
has been nothing but stress for a while now and you guys have helped
keep the stress from reaching epic proportions.

From guido at python.org  Thu May 29 06:47:59 2008
From: guido at python.org (Guido van Rossum)
Date: Wed, 28 May 2008 21:47:59 -0700
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
Message-ID: <ca471dc20805282147j1cefed8flc0749f95beb82214@mail.gmail.com>

Apart from the missing comma after 'big' this is already supported.

The time machine strikes again!

--Guido

On Wed, May 28, 2008 at 6:23 PM, Daniel Wong <allyourcode at gmail.com> wrote:
> Hi,
>
> Are there plans for introducing syntax like this:
>
> (a, (b[2], c)) = ('big' ('red', 'dog'))
>
> It seems quite doable, because Professor Hillfinger at UC Berkeley
> created pyth, a dialect of Python, which has this feature. See page 10
> of the spec he created for his students to implement the language:
>
> http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf
>
> Of course, this idea could also be applied to 'for' constructs (loops,
> list comprehensions, and generators) where assignments are implicit.
>
> Parallel looping (esp using zip) is a great use case for this. Here's
> a case that's come up more than once for me that "structured"
> assignments would solve really nicely:
>
> for n, (a, b) in enumerate(list_of_pairs): ...
>
> Currently, I must do the following instead:
>
> for n, pair in enumerate(list_of_pairs):
>  a, b = pair
>  ...
>
> This isn't such a great solution, because there's more indirection
> with the introduction of an otherwise useless variable; and (less
> significantly) there's an extra line of code that doesn't actually
> compute anything.
>
> Thoughts?
>
> Daniel
>
> PS: Sorry if this has already been discussed; I'm new to this list and
> I didn't see this mentioned in PEP 3099, unless it's covered under the
> LL(1) clause.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From allyourcode at gmail.com  Thu May 29 06:51:11 2008
From: allyourcode at gmail.com (allyourcode at gmail.com)
Date: Wed, 28 May 2008 21:51:11 -0700
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com>
	<7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com>
Message-ID: <7c8225f20805282151l2740ea43nb48fe8f71979a676@mail.gmail.com>

I just looked through the official tutorial and Dive into Python, and
didn't find anything about it in either of those places. While this
feature is documented in the language reference, it does not seem to
be a well-known feature (another example: at least one other person
did not know about it).

On 5/28/08, allyourcode at gmail.com <allyourcode at gmail.com> wrote:
> Well, I'm sorry for bothering his majesty with such a stupid idea. At
> least one other person didn't know about it either...
>
> On 5/28/08, Mike Klaas <mike.klaas at gmail.com> wrote:
>>
>> On 28-May-08, at 6:23 PM, Daniel Wong wrote:
>>
>>> Currently, I must do the following instead:
>>>
>>> for n, pair in enumerate(list_of_pairs):
>>>  a, b = pair
>>>  ...
>>>
>>> <>
>>> Thoughts?
>>
>> I find it hard to believe that you have even attempted this, which has
>> been valid in python for ages:
>>
>>  >>> for x, (a, b) in enumerate([(1,2), (3,4), (5,6)]):
>>              print x, a, b
>>
>> 0 1 2
>> 1 3 4
>> 2 5 6
>>
>> -Mike
>>
>

From allyourcode at gmail.com  Thu May 29 06:52:28 2008
From: allyourcode at gmail.com (allyourcode at gmail.com)
Date: Wed, 28 May 2008 21:52:28 -0700
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <ca471dc20805282147j1cefed8flc0749f95beb82214@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<ca471dc20805282147j1cefed8flc0749f95beb82214@mail.gmail.com>
Message-ID: <7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com>

Indeed. Thank you, Guido.

On 5/28/08, Guido van Rossum <guido at python.org> wrote:
> Apart from the missing comma after 'big' this is already supported.
>
> The time machine strikes again!
>
> --Guido
>
> On Wed, May 28, 2008 at 6:23 PM, Daniel Wong <allyourcode at gmail.com> wrote:
>> Hi,
>>
>> Are there plans for introducing syntax like this:
>>
>> (a, (b[2], c)) = ('big' ('red', 'dog'))
>>
>> It seems quite doable, because Professor Hillfinger at UC Berkeley
>> created pyth, a dialect of Python, which has this feature. See page 10
>> of the spec he created for his students to implement the language:
>>
>> http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf
>>
>> Of course, this idea could also be applied to 'for' constructs (loops,
>> list comprehensions, and generators) where assignments are implicit.
>>
>> Parallel looping (esp using zip) is a great use case for this. Here's
>> a case that's come up more than once for me that "structured"
>> assignments would solve really nicely:
>>
>> for n, (a, b) in enumerate(list_of_pairs): ...
>>
>> Currently, I must do the following instead:
>>
>> for n, pair in enumerate(list_of_pairs):
>>  a, b = pair
>>  ...
>>
>> This isn't such a great solution, because there's more indirection
>> with the introduction of an otherwise useless variable; and (less
>> significantly) there's an extra line of code that doesn't actually
>> compute anything.
>>
>> Thoughts?
>>
>> Daniel
>>
>> PS: Sorry if this has already been discussed; I'm new to this list and
>> I didn't see this mentioned in PEP 3099, unless it's covered under the
>> LL(1) clause.
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>
>
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From allyourcode at gmail.com  Thu May 29 07:55:34 2008
From: allyourcode at gmail.com (Daniel Wong)
Date: Wed, 28 May 2008 22:55:34 -0700
Subject: [Python-3000] non-local assignment
Message-ID: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com>

I'm confused by the section on "no alternate binding operator" in PEP
3099. On the one hand, it says no alternative binding operator will be
considered; yet the link provided shows that Guido is in favor of
developing a syntax for non-local assignment. Please excuse me if this
post violates that rule. Here's my suggestion on what the syntax
should look like:

set! var val

Scheme users will recognize this syntax, which has the distinct
advantage of not being confusable with regular assignment; whereas,
this is an unfortunate feature of :=, which Guido has already
rejected.

The way this is supposed to work is you go to the inner-most scope in
which var is declared and change its value there to val. If var does
not occur in any containing scope, you could raise an
UndeclaredVariable exception.

Thoughts?

Daniel

From cvrebert at gmail.com  Thu May 29 08:08:02 2008
From: cvrebert at gmail.com (Chris Rebert)
Date: Wed, 28 May 2008 23:08:02 -0700
Subject: [Python-3000] non-local assignment
In-Reply-To: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com>
References: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com>
Message-ID: <47c890dc0805282308k2bfd636aw66596546cef619da@mail.gmail.com>

It's been decided to go w/ the "nonlocal" keyword to declare outer
variables (ala the "global" keyword) rather than using an alternate
assignment operator (which was one of the competing proposals). It's
too late to make a change such as your suggestion because PEP 3104 (
http://www.python.org/dev/peps/pep-3104/ ), which proposed "nonlocal",
has already been accepted (and BDFL-blessed IIRC).

Furthermore, there's no precedent for Python operators to use both a
keyword and punctuation together like "set!", and "set" can't be used
instead as it's the name of a builtin type (in Py3K).

In the future, searching the list archives can be quite helpful.

- Chris Rebert


On Wed, May 28, 2008 at 10:55 PM, Daniel Wong <allyourcode at gmail.com> wrote:
> I'm confused by the section on "no alternate binding operator" in PEP
> 3099. On the one hand, it says no alternative binding operator will be
> considered; yet the link provided shows that Guido is in favor of
> developing a syntax for non-local assignment. Please excuse me if this
> post violates that rule. Here's my suggestion on what the syntax
> should look like:
>
> set! var val
>
> Scheme users will recognize this syntax, which has the distinct
> advantage of not being confusable with regular assignment; whereas,
> this is an unfortunate feature of :=, which Guido has already
> rejected.
>
> The way this is supposed to work is you go to the inner-most scope in
> which var is declared and change its value there to val. If var does
> not occur in any containing scope, you could raise an
> UndeclaredVariable exception.
>
> Thoughts?
>
> Daniel
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/cvrebert%40gmail.com
>

From stefan_ml at behnel.de  Thu May 29 08:15:25 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 29 May 2008 08:15:25 +0200
Subject: [Python-3000] Single buffer implied in new buffer protocol?
In-Reply-To: <g1hvto$kh3$1@ger.gmane.org>
References: <g1675f$hf4$1@ger.gmane.org> <g1hvto$kh3$1@ger.gmane.org>
Message-ID: <g1lhlt$988$1@ger.gmane.org>

Travis Oliphant wrote:
> Stefan Behnel wrote:
>> Anyway, my point is that this part of the protocol actually implies
>> setting a
>> lock on the buffer *provider* rather than the buffer itself, as the
>> buffer
>> provider cannot distinguish between different buffers based on a NULL
>> pointer
> 
> Yes, the language in the PEP could be more clear.   Obviously, if you
> haven't provided a Py_buffer structure to fill in, then you are only
> asking to lock the object's buffer from other access.

That's what I'm questioning below.


> Naturally, the exporter should handle the case when no lock is actually
> requested.

That would be considered a bug, right? So it should raise an exception? I
can't find that in the PEP.


>> But wouldn't it make more sense to *always*
>> pass the buffer pointer, to let the provider decide what it makes of the
>> flags?
> 
> Perhaps we are not understanding each other.  The Py_buffer structure
> and the buffer pointer are 2 separate things.

I know, I wasn't clear, but I actually meant what I said: the buffer pointer
may not be without interest. Imagine the case that a provider decides to
create more than one buffer, maybe one for read-only access and one for each
concurrent request for write access (and then merge the changes back on
release). Then creating the lock by passing NULL as Py_buffer would set a lock
on the provider object, not the respective write buffer (or even the
read-buffer, where no lock is required). That would be hard to handle by the
provider.


> the buf member of
> the structure is the actual buffer pointer and it is un-defined when
> getbuffer is called and it contains the buffer pointer on successful
> return.

But that's only for the buffer creation case. A lock request could just pass
in the correct buffer and set the LOCK flag. That doesn't even change the
single buffer case (where overwriting the buffer pointer with itself does no
harm), but it enables the multiple buffer case.

Stefan


From allyourcode at gmail.com  Thu May 29 08:24:47 2008
From: allyourcode at gmail.com (allyourcode at gmail.com)
Date: Wed, 28 May 2008 23:24:47 -0700
Subject: [Python-3000] non-local assignment
In-Reply-To: <47c890dc0805282308k2bfd636aw66596546cef619da@mail.gmail.com>
References: <7c8225f20805282255m52040954w78f3e27344f39608@mail.gmail.com>
	<47c890dc0805282308k2bfd636aw66596546cef619da@mail.gmail.com>
Message-ID: <7c8225f20805282324j149d5d53hdcce7de28c69eb75@mail.gmail.com>

I actually read a good portion of the thread that PEP 3099 refers to,
so I thought I had read up on the subject before making my suggestion.
I had also perused that PEP and didn't realize there was no way my
suggestion could be accepted.

I suppose it's too late, but I think it's too bad that a negative
keyword was selected, although it is completely accurate.

On 5/28/08, Chris Rebert <cvrebert at gmail.com> wrote:
> It's been decided to go w/ the "nonlocal" keyword to declare outer
> variables (ala the "global" keyword) rather than using an alternate
> assignment operator (which was one of the competing proposals). It's
> too late to make a change such as your suggestion because PEP 3104 (
> http://www.python.org/dev/peps/pep-3104/ ), which proposed "nonlocal",
> has already been accepted (and BDFL-blessed IIRC).
>
> Furthermore, there's no precedent for Python operators to use both a
> keyword and punctuation together like "set!", and "set" can't be used
> instead as it's the name of a builtin type (in Py3K).
>
> In the future, searching the list archives can be quite helpful.
>
> - Chris Rebert
>
>
> On Wed, May 28, 2008 at 10:55 PM, Daniel Wong <allyourcode at gmail.com> wrote:
>> I'm confused by the section on "no alternate binding operator" in PEP
>> 3099. On the one hand, it says no alternative binding operator will be
>> considered; yet the link provided shows that Guido is in favor of
>> developing a syntax for non-local assignment. Please excuse me if this
>> post violates that rule. Here's my suggestion on what the syntax
>> should look like:
>>
>> set! var val
>>
>> Scheme users will recognize this syntax, which has the distinct
>> advantage of not being confusable with regular assignment; whereas,
>> this is an unfortunate feature of :=, which Guido has already
>> rejected.
>>
>> The way this is supposed to work is you go to the inner-most scope in
>> which var is declared and change its value there to val. If var does
>> not occur in any containing scope, you could raise an
>> UndeclaredVariable exception.
>>
>> Thoughts?
>>
>> Daniel
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-3000/cvrebert%40gmail.com
>>
>

From ishimoto at gembook.org  Thu May 29 08:40:22 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Thu, 29 May 2008 15:40:22 +0900
Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com>
Message-ID: <797440730805282340h2ea9f8dfqba91f0e67f7e273e@mail.gmail.com>

On Tue, May 27, 2008 at 10:06 AM, Jim Jewett <jimjjewett at gmail.com> wrote:
>>   * Characters defined in the Unicode character database as "Separator"
>>     (Zl, Zp, Zs) other than ASCII space(0x20).
>
> Please put in a note that  Zl and Zp refer only to two specific
> unicode characters, not to what most people think of as line
> separators or paragraph markers.

Thank you for suggestion.

>
>>   * Backslash-escape quote characters(apostrophe, ') and add quote
>>     character at the beginning and the end.
>
> Do you just mean the two ASCII quotation marks  that python uses?

No, just an apostrophe(') as current Python.

>
> As written, I wondered whether it would include backquote or guillemet.

Proposal to change repr() for these character is not included in this
PEP, although I don't know what guillemet is.

>
>>  - Add ``'%a'`` string format operator. ``'%a'`` converts any python
>>   object to string using ``repr()`` and then hex-escape all non-ASCII
>>   characters. ``'%a'`` operator generates same string as ``'%r'`` in
>>   Python 2.
>
> Then why not keep the old %r, and add a new one for the unicode repr?
>

repr() and "%r" should be consistent with object's __repr()__ function.

> Is it again because of the bug where str([..., mystr, ...])   ends up
> doing repr on mystr?

I don't think it a bug, as other people described.

>
>>  - Add ``ascii()`` builtin function. ``ascii()`` converts any python
>>   object to string using ``repr()`` and then hex-escape all non-ASCII
>>   characters. ``ascii()`` generates same string as ``repr()`` in Python 2.
>
> The problem isn't that I want to be able to write code that acts the
> old way; the problem is that I want to ensure all code running on my
> system acts the old way.
> Adding an ascii() function doesn't help.

I can understand your worry to possible code breakage, but still I
think this PEP is right thing for Python 3000. ascii() may make
porting code to Python 3000 easier a bit.

>
>>   Strings to be printed for debugging are not only contained by lists or
>>   dicts, but also in many other types of object. File objects contain a
>>   file name in Unicode, exception objects contain a message in Unicode,
>>   etc. These strings should be printed in readable form when repr()ed.
>>   It is unlikely to be possible to implement a tool to print all
>>   possible object types.
>
> You could go a long way (particularly in Py3k, where everything
> inherits from object) by changing the builtin containers, and changing

Changing builtin containers is not sufficient, so the way would be too
long to be practical. Do you wish to override __repr__() method of all
types you encounter?

>>  - Make the encoding used by ``unicode_repr()`` adjustable, and make
>>   current ``repr()`` as default.
>
>>   With adjustable ``repr()``, result of ``repr()`` is unpredictable and
>>   would make impossible to write correct code involving ``repr()``.
>
> No more so than 3138.  The setting of repr is predictable on a given
> system.  (Even if you make it a changeable during a single run, it is
> predictable by checking first.)  Across systems, the 3138 proposal is
> already unpredictable, because you don't know which systems will apply
> backslash-replace on which characters (and on which runs).
>

In this PEP, result of repr() is perfectly predictable. The repr()
generates exactly same string among systems. But in general, strings
printed to console, whether generated by repr() or not, are less
predictable. Some characters in the string may be backslash-escaped,
may be replaced by '?' or may raise exception depending on user's
configuration.

From ishimoto at gembook.org  Thu May 29 08:40:50 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Thu, 29 May 2008 15:40:50 +0900
Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com>
	<fb6fbf560805271312g41c6b240h322c7142d1cf98be@mail.gmail.com>
	<797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com>
Message-ID: <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com>

On Wed, May 28, 2008 at 5:12 AM, Jim Jewett <jimjjewett at gmail.com> wrote:
>>  >>  - Add ``'%a'`` string format operator. ``'%a'`` converts any python
>>  >>   object to string using ``repr()`` and then hex-escape all non-ASCII
>>  >>   characters. ``'%a'`` operator generates same string as ``'%r'`` in
>>  >>   Python 2.
>
>>  > Then why not keep the old %r, and add a new one for the unicode repr?
>
>> repr() and "%r" should be consistent with object's __repr()__ function.
>
> Let me rephrase that:
>
> Why change repr and add a replacement that acts like old repr?

The "%r" and ascii() are not in my original proposal, but proposed in
this discussion. I added them to the PEP, but still I'm not sure they
are neccesary.

>
> Wouldn't it be easier to just add a new function (and format
> character) that act in the desirable new way?  That way there are no
> backwards compatibility problems, and people who use it will make an
> explicit choice that can be trusted.

Adding a new function is not enough, but we should define new protocol
to types such as __unicode_repr__() and implement them . For example,
the list type should implement a method which does almost same job as
__repr__().

class List:
   def __repr__(self):
       return "[%s]" % ",".join(repr(s) for s in self._items)

   def __unicode_repr__(self):
       return "[%s]" % ",".join(unicode_repr(s) for s in self._items)

I think keeping old repr() is not worth this effort.

> What I really want is that the
>
>    "No str?  Use repr instead"
>
> fallback change into
>
> "No str?  Use repr on *this* object instead, but keep using str on
> subobjects if those are printed"
>

Even If Python changed to call str() on subobjects as you want, I'll
still insist on PEP 3138. Printing result of str() is not always
relevant to debugging, and repr() is designed for debugging. str() can
not be a replacement for repr().

From stefan_ml at behnel.de  Thu May 29 09:22:22 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 29 May 2008 09:22:22 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <48397ECC.9070805@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
Message-ID: <g1llje$jma$1@ger.gmane.org>

Christian Heimes wrote:
>  * add a new file stringobject.h which contains the aliases PyString_ ->
> PyBytes_

Just a quick note that that file is still missing from SVN, so it's kind of
hard to compile existing code against the current branch state...

Stefan


From jcea at jcea.es  Thu May 29 09:34:18 2008
From: jcea at jcea.es (Jesus Cea)
Date: Thu, 29 May 2008 09:34:18 +0200
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
 (Stabilizing the C API of 2.6 and 3.0)
In-Reply-To: <483D5444.8000705@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>
	<483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com>
Message-ID: <483E5C7A.2090507@jcea.es>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

M.-A. Lemburg wrote:
| If you use PyBytes APIs, you expect to find PyBytes functions in
| the libs and also set breakpoints on these.

Very good point.

- --
Jesus Cea Avion                         _/_/      _/_/_/        _/_/_/
jcea at jcea.es - http://www.jcea.es/     _/_/    _/_/  _/_/    _/_/  _/_/
jabber / xmpp:jcea at jabber.org         _/_/    _/_/          _/_/_/_/_/
~                               _/_/  _/_/    _/_/          _/_/  _/_/
"Things are not so easy"      _/_/  _/_/    _/_/  _/_/    _/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/        _/_/_/      _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.8 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQCVAwUBSD5ccplgi5GaxT1NAQIZwQP/SMW+GFHxPWui2/tjj2DgZtnzYigjQj/o
T8/DYFXEwls65E1xukOi3zS9ePU49u+i36EaVOvYmYdasedTmODnV3anmBo49VFv
rsWWr4BBbRwLj4TjjwWPGy7KNKCvyG/mIiBH0uq9tOe2oW9gZng67e1f3snBIite
mw4qF6w9bmw=
=1Rh8
-----END PGP SIGNATURE-----

From stefan_ml at behnel.de  Thu May 29 09:57:27 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 29 May 2008 09:57:27 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <e7ba66e40805270639r15b624a8rb330e03831c8a155@mail.gmail.com>
References: <48397ECC.9070805@cheimes.de>
	<e7ba66e40805270639r15b624a8rb330e03831c8a155@mail.gmail.com>
Message-ID: <g1lnl7$pjh$1@ger.gmane.org>

Lisandro Dalcin wrote:
> Chistian, I've posted some weeks ago some observation about the status
> of PyNumberMethods API. The thread link is below, I t did not received
> much atention.
> 
> http://mail.python.org/pipermail/python-3000/2008-May/013594.html
> 
> Now I sumarize that post
> 
> * 'nb_nonzero' was renamed to 'nb_bool'

That's a non-critical change. Usage of these field names outside of the Python
core should be extremely rare.


> * 'nb_inplace_divide' was removed

as was nb_divide, apparently, which is pretty close to the beginning of the
struct.


> * 'nb_hex', 'nb_oct', and 'nb_coerce' are there, but they are unused
> 
> IMHO, the PyNumbersMethods struct should be left as in Py2, or it
> should be cleaned up, that is, all unused slots should be removed.

Since there were already two fields right inside the struct that were removed
(one even before the three you mention), I think it makes sense to remove the
remaining left-overs also. I filed a bug.

http://bugs.python.org/issue2997

Stefan


From stefan_ml at behnel.de  Thu May 29 10:30:55 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 29 May 2008 10:30:55 +0200
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
In-Reply-To: <483D5444.8000705@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>
	<483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com>
Message-ID: <g1lpjv$vu3$1@ger.gmane.org>

M.-A. Lemburg wrote:
> If you use PyBytes APIs, you expect to find PyBytes functions in
> the libs and also set breakpoints on these.

AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here.

Besides, how likely is it that users set a breakpoint on the PyBytes/PyString
functions?

Stefan


From paul.bedaride at gmail.com  Thu May 29 10:50:25 2008
From: paul.bedaride at gmail.com (paul bedaride)
Date: Thu, 29 May 2008 10:50:25 +0200
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<ca471dc20805282147j1cefed8flc0749f95beb82214@mail.gmail.com>
	<7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com>
Message-ID: <fa7d4c4f0805290150u68f98a1m66963db3360e4880@mail.gmail.com>

this work, (a, (b[2], c)) = ('big', ('red', 'dog'))
but this not (a, (b[2], c)) += ('big' ('red', 'dog'))

paul bedaride

On Thu, May 29, 2008 at 6:52 AM, <allyourcode at gmail.com> wrote:

> Indeed. Thank you, Guido.
>
> On 5/28/08, Guido van Rossum <guido at python.org> wrote:
> > Apart from the missing comma after 'big' this is already supported.
> >
> > The time machine strikes again!
> >
> > --Guido
> >
> > On Wed, May 28, 2008 at 6:23 PM, Daniel Wong <allyourcode at gmail.com>
> wrote:
> >> Hi,
> >>
> >> Are there plans for introducing syntax like this:
> >>
> >> (a, (b[2], c)) = ('big' ('red', 'dog'))
> >>
> >> It seems quite doable, because Professor Hillfinger at UC Berkeley
> >> created pyth, a dialect of Python, which has this feature. See page 10
> >> of the spec he created for his students to implement the language:
> >>
> >> http://inst.eecs.berkeley.edu/~cs164/sp08/docs/pyth.pdf<http://inst.eecs.berkeley.edu/%7Ecs164/sp08/docs/pyth.pdf>
> >>
> >> Of course, this idea could also be applied to 'for' constructs (loops,
> >> list comprehensions, and generators) where assignments are implicit.
> >>
> >> Parallel looping (esp using zip) is a great use case for this. Here's
> >> a case that's come up more than once for me that "structured"
> >> assignments would solve really nicely:
> >>
> >> for n, (a, b) in enumerate(list_of_pairs): ...
> >>
> >> Currently, I must do the following instead:
> >>
> >> for n, pair in enumerate(list_of_pairs):
> >>  a, b = pair
> >>  ...
> >>
> >> This isn't such a great solution, because there's more indirection
> >> with the introduction of an otherwise useless variable; and (less
> >> significantly) there's an extra line of code that doesn't actually
> >> compute anything.
> >>
> >> Thoughts?
> >>
> >> Daniel
> >>
> >> PS: Sorry if this has already been discussed; I'm new to this list and
> >> I didn't see this mentioned in PEP 3099, unless it's covered under the
> >> LL(1) clause.
> >> _______________________________________________
> >> Python-3000 mailing list
> >> Python-3000 at python.org
> >> http://mail.python.org/mailman/listinfo/python-3000
> >> Unsubscribe:
> >> http://mail.python.org/mailman/options/python-3000/guido%40python.org
> >>
> >
> >
> >
> > --
> > --Guido van Rossum (home page: http://www.python.org/~guido/<http://www.python.org/%7Eguido/>
> )
> >
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/paul.bedaride%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080529/e010f00d/attachment.htm>

From wescpy at gmail.com  Thu May 29 10:56:55 2008
From: wescpy at gmail.com (wesley chun)
Date: Thu, 29 May 2008 01:56:55 -0700
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
Message-ID: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>

hi,

i'm looking to duplicate this string format operator '#' functionality
with the new format(). here it is using the old string format
operator:

>>> i = 45
>>> 'dec: %d/oct: %o/hex: %X' % (i, i, i)         # no "#" means no leading "0" or "0x/X"
'dec: 45/oct: 55/hex: 2D'
>>> 'dec: %d/oct: %#o/hex: %#X' % (i, i, i)     # leading "#" gives us "0" and "0x/X"
'dec: 45/oct: 0o55/hex: 0X2D'

if i repeat both of the above with format(), it fails with the "#":

>>> 'dec: {0}/oct: {0:o}/hex: {0:X}'.format(i)
'dec: 45/oct: 55/hex: 2D'
>>> 'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i)
Traceback (most recent call last):
  File "<pyshell#33>", line 1, in <module>
    'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i)
ValueError: Invalid conversion specification

i have to resort to the uglier:

>>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i)
'dec: 45/oct: 0o55/hex: 0X2D'

is this functionality being dropped, or am i missing something?  i
didn't get anything from searching the Py3000 mailing list archives. i
couldn't find anything in either formatter.h nor stringobject.c.

secondly, and much more minor, is that i think there's a minor typo in the PEP:
print format(10.0, "7.3g")  <-- print() is now a function so it needs
another pair of ( ).

thanks,
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
 http://corepython.com

wesley.j.chun :: wescpy-at-gmail.com
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com

From stefan_ml at behnel.de  Thu May 29 10:59:29 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 29 May 2008 10:59:29 +0200
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805282151l2740ea43nb48fe8f71979a676@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>	<72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com>	<7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com>
	<7c8225f20805282151l2740ea43nb48fe8f71979a676@mail.gmail.com>
Message-ID: <g1lr9g$4un$1@ger.gmane.org>

allyourcode at gmail.com wrote:
> I just looked through the official tutorial and Dive into Python, and
> didn't find anything about it in either of those places.

Tutorial section on "tuples and sequences", not quite the most hidden place in
the universe.

http://docs.python.org/tut/node7.html#SECTION007300000000000000000

Stefan


From ncoghlan at gmail.com  Thu May 29 11:47:37 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 29 May 2008 19:47:37 +1000
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
Message-ID: <483E7BB9.5060002@gmail.com>

wesley chun wrote:
> i have to resort to the uglier:
> 
>>>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i)
> 'dec: 45/oct: 0o55/hex: 0X2D'

Is being explicit about the displayed prefix really that much uglier? 
The old # alternative display formats were somewhat arbitrary.

> is this functionality being dropped, or am i missing something?  i
> didn't get anything from searching the Py3000 mailing list archives. i
> couldn't find anything in either formatter.h nor stringobject.c.
> 
> secondly, and much more minor, is that i think there's a minor typo in the PEP:
> print format(10.0, "7.3g")  <-- print() is now a function so it needs
> another pair of ( ).

It works fine as written in 2.x :)

(but, yes, you're right that as a 3000-series PEP, 3101 should probably 
treat print() as a function in its examples)

Cheers,
Nick.
-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Thu May 29 11:53:05 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 29 May 2008 19:53:05 +1000
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
In-Reply-To: <g1lpjv$vu3$1@ger.gmane.org>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D49ED.8060907@cheimes.de>
	<483D5444.8000705@egenix.com> <g1lpjv$vu3$1@ger.gmane.org>
Message-ID: <483E7D01.4010603@gmail.com>

Stefan Behnel wrote:
> M.-A. Lemburg wrote:
>> If you use PyBytes APIs, you expect to find PyBytes functions in
>> the libs and also set breakpoints on these.
> 
> AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here.

The PyBytes_* functions appear to be there, but a preprocessor macro 
means it is actually the PyString_* functions that appear in the Python 
DLL. That's great from a backwards compatibility point of view, but 
seriously confusing from the point of view of anyone trying to embed or 
otherwise debug Python 2.6.

> Besides, how likely is it that users set a breakpoint on the PyBytes/PyString
> functions?

Not very likely at all - but it would still be nice if the PyBytes_* 
symbols were visible to the linker as well as the preprocessor.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From lists at cheimes.de  Thu May 29 11:59:28 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 May 2008 11:59:28 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <g1llje$jma$1@ger.gmane.org>
References: <48397ECC.9070805@cheimes.de> <g1llje$jma$1@ger.gmane.org>
Message-ID: <483E7E80.70403@cheimes.de>

Stefan Behnel schrieb:
> Christian Heimes wrote:
>>  * add a new file stringobject.h which contains the aliases PyString_ ->
>> PyBytes_
> 
> Just a quick note that that file is still missing from SVN, so it's kind of
> hard to compile existing code against the current branch state...

No, the file is in SVN. It's just not in the py3k branch because it's
not vital to the core.

I had plans to add a Python 2.x compatibility header to Python 3.0  But
I'm not going to spend any more time on the topic or any other
development until we have reached an agreement on the naming. I don't
want to waste more of my free time in vain.

Christian

From lists at cheimes.de  Thu May 29 12:01:44 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 May 2008 12:01:44 +0200
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
In-Reply-To: <g1lpjv$vu3$1@ger.gmane.org>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D49ED.8060907@cheimes.de>
	<483D5444.8000705@egenix.com> <g1lpjv$vu3$1@ger.gmane.org>
Message-ID: <483E7F08.80704@cheimes.de>

Stefan Behnel schrieb:
> M.-A. Lemburg wrote:
>> If you use PyBytes APIs, you expect to find PyBytes functions in
>> the libs and also set breakpoints on these.
> 
> AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here.

In Python 2.6 the PyBytes_* functions are only available to the compiler
but not to the linker. In 2.6 the ABI functions are PyString_* and in
3.0 it's PyBytes_*

Christian

From qrczak at knm.org.pl  Thu May 29 12:05:20 2008
From: qrczak at knm.org.pl (=?UTF-8?Q?Marcin_=E2=80=98Qrczak=E2=80=99_Kowalczyk?=)
Date: Thu, 29 May 2008 12:05:20 +0200
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
In-Reply-To: <483E7D01.4010603@gmail.com>
References: <48397ECC.9070805@cheimes.de> <483ABB23.6050900@egenix.com>
	<483ABDCF.8000105@cheimes.de> <483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
	<483D49ED.8060907@cheimes.de> <483D5444.8000705@egenix.com>
	<g1lpjv$vu3$1@ger.gmane.org> <483E7D01.4010603@gmail.com>
Message-ID: <3f4107910805290305s63b97e73i87824755f7aa31fb@mail.gmail.com>

2008/5/29 Nick Coghlan <ncoghlan at gmail.com>:

> it would still be nice if the PyBytes_* symbols
> were visible to the linker as well as the preprocessor.

If this is not a strict requirement but a useful extra, then it might
be done in an unportable way. GCC has an 'alias' attribute:
http://gcc.gnu.org/onlinedocs/gcc/Function-Attributes.html

-- 
Marcin Kowalczyk
qrczak at knm.org.pl
http://qrnik.knm.org.pl/~qrczak/

From stefan_ml at behnel.de  Thu May 29 12:08:47 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Thu, 29 May 2008 12:08:47 +0200
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
In-Reply-To: <483E7F08.80704@cheimes.de>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D49ED.8060907@cheimes.de>	<483D5444.8000705@egenix.com>
	<g1lpjv$vu3$1@ger.gmane.org> <483E7F08.80704@cheimes.de>
Message-ID: <g1lvbe$k98$1@ger.gmane.org>

Christian Heimes wrote:
> Stefan Behnel schrieb:
>> M.-A. Lemburg wrote:
>>> If you use PyBytes APIs, you expect to find PyBytes functions in
>>> the libs and also set breakpoints on these.
>> AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here.
> 
> In Python 2.6 the PyBytes_* functions are only available to the compiler
> but not to the linker. In 2.6 the ABI functions are PyString_* and in
> 3.0 it's PyBytes_*

Ah, even better then. Given that it was always PyString_*() in Py2, that
totally sounds like the right thing to me. I really don't think anyone using
the newly advertised Py3 PyBytes_*() C-API functions will honestly expect them
to be available in a 2.x binary lib.

Stefan


From ncoghlan at gmail.com  Thu May 29 12:34:58 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 29 May 2008 20:34:58 +1000
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
In-Reply-To: <g1m0g4$nqi$1@ger.gmane.org>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D49ED.8060907@cheimes.de>	<483D5444.8000705@egenix.com>	<g1lpjv$vu3$1@ger.gmane.org>
	<483E7D01.4010603@gmail.com> <g1m0g4$nqi$1@ger.gmane.org>
Message-ID: <483E86D2.5000809@gmail.com>

Stefan Behnel wrote:
> Nick Coghlan wrote:
>> Stefan Behnel wrote:
>>> Besides, how likely is it that users set a breakpoint on the
>>> PyBytes/PyString functions?
>> Not very likely at all - but it would still be nice if the PyBytes_*
>> symbols were visible to the linker as well as the preprocessor.
> 
> Right, that's a nice-to-have, an add-on. Why don't we just let Christian
> finish his work, which is vital for the beta release? Then it's still time to
> file a bug report on the missing bits and provide a patch that adds linker
> symbols for PyBytes_*() in Py2.6 as an additional feature.

Yeah, it took me a while to get my head around what he was trying to do, 
but GPS explained it pretty well elsewhere in this thread.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From qgallet at gmail.com  Thu May 29 15:25:12 2008
From: qgallet at gmail.com (Quentin Gallet-Gilles)
Date: Thu, 29 May 2008 15:25:12 +0200
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <g1ll0v$i0p$1@ger.gmane.org>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
	<g1ll0v$i0p$1@ger.gmane.org>
Message-ID: <8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com>

On Thu, May 29, 2008 at 9:12 AM, Georg Brandl <g.brandl at gmx.net> wrote:

> Brett Cannon schrieb:
>
>> The issues related to PEP 3108 now total 14. With the beta
>> (supposedly) in a week, I am hoping the last minor details can be
>> pulled together or decisions made on what can be postponed and what
>> should definitely be considered a release blocker.
>>
>> Issue 2847 - the aifc module still imports the cl module in 3.0.
>> Problem is that the cl module is gone. =) So it seems silly to have
>> the imports lying about. This can probably be changed to critical.
>>
>
> It shouldn't be a problem to rip everything cl-related out of aifc.
> The question is how useful aifc will be after that ...
>

Has someone already used that module ? I took a look into it, but I'm a bit
confused about the various compression types, case-sensitivity and
compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the
same encoding ? Can we use the audioop module for ALAW, just like it's
already done for ULAW ?

[1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html

Quentin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080529/4fb104d5/attachment.htm>

From eric+python-dev at trueblade.com  Thu May 29 15:28:53 2008
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 29 May 2008 09:28:53 -0400
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
Message-ID: <483EAF95.5050503@trueblade.com>

wesley chun wrote:
> hi,
> 
> i'm looking to duplicate this string format operator '#' functionality
> with the new format(). here it is using the old string format
> operator:
> 
>>>> i = 45
>>>> 'dec: %d/oct: %o/hex: %X' % (i, i, i)         # no "#" means no leading "0" or "0x/X"
> 'dec: 45/oct: 55/hex: 2D'
>>>> 'dec: %d/oct: %#o/hex: %#X' % (i, i, i)     # leading "#" gives us "0" and "0x/X"
> 'dec: 45/oct: 0o55/hex: 0X2D'
> 
> if i repeat both of the above with format(), it fails with the "#":
> 
>>>> 'dec: {0}/oct: {0:o}/hex: {0:X}'.format(i)
> 'dec: 45/oct: 55/hex: 2D'
>>>> 'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i)
> Traceback (most recent call last):
>   File "<pyshell#33>", line 1, in <module>
>     'dec: {0}/oct: {0:#o}/hex: {0:#X}'.format(i)
> ValueError: Invalid conversion specification
> 
> i have to resort to the uglier:
> 
>>>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i)
> 'dec: 45/oct: 0o55/hex: 0X2D'
> 
> is this functionality being dropped, or am i missing something?  i
> didn't get anything from searching the Py3000 mailing list archives. i
> couldn't find anything in either formatter.h nor stringobject.c.

I don't see it as a big problem.  You can now use any prefix you want, 
instead of the hard coded values that # supplied.

> 
> secondly, and much more minor, is that i think there's a minor typo in the PEP:
> print format(10.0, "7.3g")  <-- print() is now a function so it needs
> another pair of ( ).

Fixed in r63786.  Thanks for catching it.  There was another print() 
function already in the PEP, so clearly the intent was to be 3.0 compliant.

Eric.

> 
> thanks,
> -- wesley
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> "Core Python Programming", Prentice Hall, (c)2007,2001
>  http://corepython.com
> 
> wesley.j.chun :: wescpy-at-gmail.com
> python training and technical consulting
> cyberweb.consulting : silicon valley, ca
> http://cyberwebconsulting.com
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/eric%2Bpython-dev%40trueblade.com
> 


From g.brandl at gmx.net  Thu May 29 16:28:16 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Thu, 29 May 2008 16:28:16 +0200
Subject: [Python-3000] [Python-Dev] PyString -> PyBytes C API renaming
In-Reply-To: <g1lvbe$k98$1@ger.gmane.org>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D49ED.8060907@cheimes.de>	<483D5444.8000705@egenix.com>	<g1lpjv$vu3$1@ger.gmane.org>
	<483E7F08.80704@cheimes.de> <g1lvbe$k98$1@ger.gmane.org>
Message-ID: <g1mei6$b9n$1@ger.gmane.org>

Stefan Behnel schrieb:
> Christian Heimes wrote:
>> Stefan Behnel schrieb:
>>> M.-A. Lemburg wrote:
>>>> If you use PyBytes APIs, you expect to find PyBytes functions in
>>>> the libs and also set breakpoints on these.
>>> AFAICT, the PyBytes_* functions are in both Py2.6 and Py3 now, so no problem here.
>> 
>> In Python 2.6 the PyBytes_* functions are only available to the compiler
>> but not to the linker. In 2.6 the ABI functions are PyString_* and in
>> 3.0 it's PyBytes_*
> 
> Ah, even better then. Given that it was always PyString_*() in Py2, that
> totally sounds like the right thing to me. I really don't think anyone using
> the newly advertised Py3 PyBytes_*() C-API functions will honestly expect them
> to be available in a 2.x binary lib.

Can't we have the best of both worlds -- have the macro and a stub function
for the linker, like done with PyErr_Warn?

Georg


From mal at egenix.com  Thu May 29 16:51:25 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 29 May 2008 16:51:25 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <08May28.100829pdt."58698"@synergy1.parc.xerox.com>
References: <48397ECC.9070805@cheimes.de>
	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>
	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>
	<483BDE11.509@egenix.com>	<483D300B.5090309@egenix.com>
	<08May28.100829pdt."58698"@synergy1.parc.xerox.com>
Message-ID: <483EC2ED.7010104@egenix.com>

On 2008-05-28 19:08, Bill Janssen wrote:
>> I'm beginning to wonder whether I'm the only one who cares about
>> the Python 2.x branch not getting cluttered up with artifacts caused
>> by a broken forward merge strategy.
> 
> I share your concern.  Seems to me that perhaps (not sure, but
> perhaps) the rush to back-port from 3.x, and the concern about
> minimizing pain of moving from 2.x to 3.x, has become the tail wagging
> the dog.

Indeed.

If the need to be able to forward merge changes from the 2.x trunk
to the 3.x branch is the only reason for the current approach, then
we need to find a better procedure for getting patches to 2.x
forwarded to 3.x.

I believe that everyone is aware that 3.x breaks things and that's
fine.

However, the reason for introducing such breakage in 3.x
is that users have the option to decide whether and when to switch
to the new major version.

Being able to play with 3.x features in 2.x is nice, but I wouldn't
really consider those essential for 2.x. It certainly doesn't
warrant causing major problems in the 2.x releases.

The module renaming backport was one example (which was undone again),
the C API renaming is another. I expect more such features to be
backported from 3.x to 2.x (even though I don't really think it's
worth the trouble) and since this always means that changes have
to applied in two worlds, we'll need a better process for getting
changes in one major release ported to the other.

Simply tweaking 2.x into shape so that the rather simple minded
SVN merge command works, isn't a good enough procedure for this.

That's why I suggested to use an intermediate form or branch
for the merging - one that implements the 2.x with all renaming
and syntax fixing applied.

This would:

  * reduce the number of merge conflicts since the renaming
    would already have happened

  * reduce the patch sizes that have to be applied to 3.x in
    order to stay in sync with 2.x

  * result in a tool chain that makes it easier for all Python
    users to port their code to 3.x

  * simplify renaming or reorg of modules, functions, methods
    and C APIs without requiring major changes on either side

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 29 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            38 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From mal at egenix.com  Thu May 29 17:22:58 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 29 May 2008 17:22:58 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>
References: <48397ECC.9070805@cheimes.de>
	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>
	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>
	<483BDE11.509@egenix.com>	<483D300B.5090309@egenix.com>
	<52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>
Message-ID: <483ECA52.6040000@egenix.com>

On 2008-05-28 22:47, Gregory P. Smith wrote:
> On Wed, May 28, 2008 at 3:12 AM, M.-A. Lemburg <mal at egenix.com> wrote:
>> I'm beginning to wonder whether I'm the only one who cares about
>> the Python 2.x branch not getting cluttered up with artifacts caused
>> by a broken forward merge strategy.
>>
>> How can it be that we allow major C API changes such as the renaming
>> of the PyString APIs to go into the trunk without discussion or
>> a PEP ?
> 
> I do not consider it a C API change.  The API and ABI have not
> changed.  Old code still compiles.  Old binaries still dynamically
> load and work fine.  (I just confirmed this by importing a couple
> python2.4 .so files into my non-debug build of 2.6 trunk)
> 
> A of the PyString APIs are the real implementations in 2.x and are
> still there.  We only switched to using their PyBytes equivalent names
> within the Python trunk code base.
> 
> Are you objecting to our own code switching to use a different name
> even though the actual underlying API and ABI haven't changed?  I
> suppose to people reading the code and going against old reference
> books it could be confusing but they've got to get used to the new
> names somehow and sometime.
> 
> I strongly support changes like this one that makes the life of
> porting C code forwards and backwards between 2.x and 3.x easier
> without breaking compatibility with earlier 2.x version because that
> is going to be a serious pain for all of us otherwise.

Well, first of all, it is a change in the C API:
APIs have different names now, they live in different files,
the Python documentation doesn't apply anymore, books have to
be updated, programmers trained, etc. etc. That's fine for
3.x, it's not for 2.x.

Second, if you leave out the "ease merging" argument, all of
this is not really necessary in 2.x. If you absolutely want
to have PyBytes APIs in 2.x, then you can *add* them, without
removing the PyString APIs. We have done that on a smaller
scale a couple of times in the past (turned functions into
macros or vice-versa).

And finally, the "merge" argument itself is not really all that
strong. It's just a matter of getting the procedure corrected.
Then you can rename and restructure as much as you want in
3.x - without affecting the stability and matureness of the
2.x branch.

I suspect more of these backports to happen, so we better get
things done right now instead of putting Python's reputation
as stable and mature programming language at risk.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 29 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            38 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From lars at ibp.de  Thu May 29 16:56:20 2008
From: lars at ibp.de (Lars Immisch)
Date: Thu, 29 May 2008 16:56:20 +0200
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>	<g1ll0v$i0p$1@ger.gmane.org>
	<8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com>
Message-ID: <483EC414.7080603@ibp.de>

<snip>
>         Issue 2847 - the aifc module still imports the cl module in 3.0.
>         Problem is that the cl module is gone. =) So it seems silly to have
>         the imports lying about. This can probably be changed to critical.
> 
> 
>     It shouldn't be a problem to rip everything cl-related out of aifc.
>     The question is how useful aifc will be after that ...
> 
> 
> Has someone already used that module ? I took a look into it, but I'm a 
> bit confused about the various compression types, case-sensitivity and 
> compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the 
> same encoding ? Can we use the audioop module for ALAW, just like it's 
> already done for ULAW ?

There is just one alaw I've ever come across (G.711), and the audioop 
implementation could be used (audioop's alaw support is younger than the 
aifc module, BTW)

The capitalisation is confusing, but your document [1] says: "Apple 
Computer's QuickTime player recognize only the Apple compression types. 
Although "ALAW" and "ULAW" contain identical sound samples to the "alaw" 
and "ulaw" formats and were in use long before Apple introduced the new 
codes,  QuickTime does not recognize them."

So this seems just a matter of naming in the AIFC, but not a matter of 
two different alaw implementations.

- Lars

[1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html

From qgallet at gmail.com  Thu May 29 17:39:17 2008
From: qgallet at gmail.com (Quentin Gallet-Gilles)
Date: Thu, 29 May 2008 17:39:17 +0200
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <483EC414.7080603@ibp.de>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
	<g1ll0v$i0p$1@ger.gmane.org>
	<8b943f2b0805290625w19ab1fd3l48f00f40e630c39d@mail.gmail.com>
	<483EC414.7080603@ibp.de>
Message-ID: <8b943f2b0805290839s7a1f3238g9e21407a56c34159@mail.gmail.com>

On Thu, May 29, 2008 at 4:56 PM, Lars Immisch <lars at ibp.de> wrote:

> <snip>
>
>>        Issue 2847 - the aifc module still imports the cl module in 3.0.
>>        Problem is that the cl module is gone. =) So it seems silly to have
>>        the imports lying about. This can probably be changed to critical.
>>
>>
>>    It shouldn't be a problem to rip everything cl-related out of aifc.
>>    The question is how useful aifc will be after that ...
>>
>>
>> Has someone already used that module ? I took a look into it, but I'm a
>> bit confused about the various compression types, case-sensitivity and
>> compatibility issues [1]. Are Apple's "alaw" and SGI's "ALAW" really the
>> same encoding ? Can we use the audioop module for ALAW, just like it's
>> already done for ULAW ?
>>
>
> There is just one alaw I've ever come across (G.711), and the audioop
> implementation could be used (audioop's alaw support is younger than the
> aifc module, BTW)
>
> The capitalisation is confusing, but your document [1] says: "Apple
> Computer's QuickTime player recognize only the Apple compression types.
> Although "ALAW" and "ULAW" contain identical sound samples to the "alaw" and
> "ulaw" formats and were in use long before Apple introduced the new codes,
>  QuickTime does not recognize them."
>
> So this seems just a matter of naming in the AIFC, but not a matter of two
> different alaw implementations.
>
> - Lars
>

Ok, I'll handle this issue. I'll be using the audioop implementation as a
replacement of the SGI compression library. I'll also create a test suite,
as Brett mentioned in the bug tracker the module was missing one.

Quentin


>
> [1] http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/AIFF/AIFF.html
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080529/32aaed06/attachment.htm>

From lists at cheimes.de  Thu May 29 17:45:24 2008
From: lists at cheimes.de (Christian Heimes)
Date: Thu, 29 May 2008 17:45:24 +0200
Subject: [Python-3000] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483ECA52.6040000@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D300B.5090309@egenix.com>	<52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>
	<483ECA52.6040000@egenix.com>
Message-ID: <483ECF94.7060607@cheimes.de>

M.-A. Lemburg schrieb:
> Well, first of all, it is a change in the C API:
> APIs have different names now, they live in different files,
> the Python documentation doesn't apply anymore, books have to
> be updated, programmers trained, etc. etc. That's fine for
> 3.x, it's not for 2.x.

No, that's not correct. The 2.x API is still the same. I've only changed
the internal code.

> Second, if you leave out the "ease merging" argument, all of
> this is not really necessary in 2.x. If you absolutely want
> to have PyBytes APIs in 2.x, then you can *add* them, without
> removing the PyString APIs. We have done that on a smaller
> scale a couple of times in the past (turned functions into
> macros or vice-versa).

The PyString methods are still available and the official API for
dealing with str objects in 2.x.

> And finally, the "merge" argument itself is not really all that
> strong. It's just a matter of getting the procedure corrected.
> Then you can rename and restructure as much as you want in
> 3.x - without affecting the stability and matureness of the
> 2.x branch.

I'm volunteering to revert my chances if you are volunteering to keep
the Python 2.x series in sync with the 3.x series.

Christian

From brett at python.org  Thu May 29 19:32:04 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 29 May 2008 10:32:04 -0700
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <g1ll0v$i0p$1@ger.gmane.org>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
	<g1ll0v$i0p$1@ger.gmane.org>
Message-ID: <bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>

On Thu, May 29, 2008 at 12:12 AM, Georg Brandl <g.brandl at gmx.net> wrote:
> Brett Cannon schrieb:
>>
>> The issues related to PEP 3108 now total 14. With the beta
>> (supposedly) in a week, I am hoping the last minor details can be
>> pulled together or decisions made on what can be postponed and what
>> should definitely be considered a release blocker.
>>
>> Issue 2847 - the aifc module still imports the cl module in 3.0.
>> Problem is that the cl module is gone. =) So it seems silly to have
>> the imports lying about. This can probably be changed to critical.
>
> It shouldn't be a problem to rip everything cl-related out of aifc.
> The question is how useful aifc will be after that ...
>

If it ends up not being useful then the module can just go.

>> Issue 2848 - mimetools has been deprecated for a while, but it is
>> still used in a bunch of places. Since this has been deprecated in PEP
>> 4 for a long time, should we add the removal warning in 2.6 now and
>> then make its actual removal of usage something to do by another beta?
>>
>> Issue 2849 - rfc822 is the same problem as mimetools.
>
> The problem is that nobody seems to know what exactly distinguishes
> mimetools/rfc822' classes and its successor's (email's) classes, so
> it's hard to replace it in the stdlib.
>

Right. I have looked myself over the years and it never seemed
brain-dead simple.

>> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then
>> again, pydoc is busted thanks to the new doc format.
>
> I will try to handle this in the coming week.
>

Fred had the interesting suggestion of removing pydoc in Py3K based on
the thinking that documentation tools like pydoc should be external to
Python. With the docs now so easy to generate directly, should pydoc
perhaps just be gutted to only what is needed for help() to work?

>> Issue 2919 - profile and cProfile needs to be merged. This has not
>> been dealt with yet. Would it be reasonable to deprecate importing
>> cProfile directly in 2.6 with the assumption the merge will work out
>> for 3.0?
>
> That's not the right way to go, you don't want to deprecate cStringIO
> or cPickle either.
>

Yeah, sorry, you're right. Guess my brain was not fully working when I
wrote that. =)

>> So that is everything that's left. Issue 2775 is the tracking issue so
>> you can look there to see what issues are still open and need work. I
>> was hoping to spend Monday and Tuesday trying to tie up as many loose
>> ends as possible, but the conference paper I have been working on that
>> was due Sunday is now due a week later, and so Monday and Tuesday will
>> be spent on that (supervisor's orders). Plus I am flying out Wednesday
>> for 10 days to help my mother move and I don't know when I will get
>> Net again. In other words, I still need help. =)
>
> Let's hope we get this right in time.
>
> Then again, there are lots of other release blockers, so it may well be
> that the beta is delayed by some time.

Guess it depends on the whim of the release manager. =)

-Brett

From allyourcode at gmail.com  Thu May 29 19:51:13 2008
From: allyourcode at gmail.com (allyourcode at gmail.com)
Date: Thu, 29 May 2008 10:51:13 -0700
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <fa7d4c4f0805290150u68f98a1m66963db3360e4880@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<ca471dc20805282147j1cefed8flc0749f95beb82214@mail.gmail.com>
	<7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com>
	<fa7d4c4f0805290150u68f98a1m66963db3360e4880@mail.gmail.com>
Message-ID: <7c8225f20805291051x3f81b716p5caadafefc4af4ed@mail.gmail.com>

This is in response to Stefan Behnel, who wrote

----

Tutorial section on "tuples and sequences", not quite the most hidden place in
the universe.

http://docs.python.org/tut/node7.html#SECTION007300000000000000000

Stefan

----

I just read that section twice and no where does it mention that
Python does what I was suggesting. In fact, by the discussion on
sequence unpacking, it seems to imply that Python does *not* do what I
wanted.

PS: Sorry for letting all of this sarcasm get under my skin and clog
up the mailing list, but it's really mean spirited and unnecssary.

From wescpy at gmail.com  Thu May 29 20:06:58 2008
From: wescpy at gmail.com (wesley chun)
Date: Thu, 29 May 2008 11:06:58 -0700
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <483E7BB9.5060002@gmail.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
	<483E7BB9.5060002@gmail.com>
Message-ID: <78b3a9580805291106v283d8502gc22eae871e0f5aa6@mail.gmail.com>

> wesley chun wrote:
>>
>> i have to resort to the uglier:
>> >>> 'dec: {0}/oct: 0o{0:o}/hex: 0X{0:X}'.format(i)
>> 'dec: 45/oct: 0o55/hex: 0X2D'


[Nick Coghlan <ncoghlan at gmail.com>]:
> Is being explicit about the displayed prefix really that much uglier? The
> old # alternative display formats were somewhat arbitrary.

[Eric Smith <eric+python-dev at trueblade.com>]:
> I don't see it as a big problem.  You can now use any prefix you want,
> instead of the hard coded values that # supplied.

based on both your replies, it sounds like it's going away!  :-)
no, i don't have a problem with it. however, it'd be nice to put
something about this in the PEP in case anyone else wonders/asks.


>> print format(10.0, "7.3g")

[Nick Coghlan <ncoghlan at gmail.com>]:
> It works fine as written in 2.x :)
> (but, yes, you're right that as a 3000-series PEP, 3101 should probably
> treat print() as a function in its examples)

[Eric Smith <eric+python-dev at trueblade.com>]:
> Fixed in r63786.  Thanks for catching it.  There was another print()
> function already in the PEP, so clearly the intent was to be 3.0 compliant.

another 2 suggestions then (only pick one):

1. if both str.format() and format() are going to be backported to
    2.x, there should be an example of it there too (see below where
    i'm also taking an additional liberty of changing "g" to "f"
    which i use more and give another number as an example):

    2.x:
    >>> print format(10.8765, '7.2f')
      10.88

    3.x:
    >>> print(format(10.8765, '7.2f'))
      10.88

2. drop the print altogether, esp. since this is about strings.

    >>> format(10.8765, '7.2f')
    '  10.88'

cheers,
-wesley

From mal at egenix.com  Thu May 29 20:08:57 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Thu, 29 May 2008 20:08:57 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483ECF94.7060607@cheimes.de>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D300B.5090309@egenix.com>	<52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>	<483ECA52.6040000@egenix.com>
	<483ECF94.7060607@cheimes.de>
Message-ID: <483EF139.8000606@egenix.com>

Christian,

so far you have not responded to any of the suggestions made on
this thread, only defended your checkin. That's not very helpful
in getting to some conclusion.

* What's so hard about going with a proper, standard solution that
doesn't involve using your preprocessor hack ?

* Why can't we have both PyString *and* PyBytes exposed in 2.x,
with one redirecting to the other ?

* Why should the 2.x code base turn to hacks, just because 3.x wants
to restructure itself ?

* Why aren't you even considering my proposed solution for this
whole renaming and reorg problem ?

BTW: Is there some PEP or wiki page explaining how you actually
implement the merging from 2.x to 3.x ? I'm still under the assumption
that you're only using svnmerge.py for this and doing straight
merging from the trunk to the branch.

Not sure how others feel about it, but if the only option you would
feel comfortable with is not having  the 3.x renaming backported,
then I'd rather go with that, really. It's easy enough to add
a header file to map PyString APIs to PyBytes if you want to
port an extension to 3.x.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 29 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            38 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611


On 2008-05-29 17:45, Christian Heimes wrote:
> M.-A. Lemburg schrieb:
>> Well, first of all, it is a change in the C API:
>> APIs have different names now, they live in different files,
>> the Python documentation doesn't apply anymore, books have to
>> be updated, programmers trained, etc. etc. That's fine for
>> 3.x, it's not for 2.x.
> 
> No, that's not correct. The 2.x API is still the same. I've only changed
> the internal code.
> 
>> Second, if you leave out the "ease merging" argument, all of
>> this is not really necessary in 2.x. If you absolutely want
>> to have PyBytes APIs in 2.x, then you can *add* them, without
>> removing the PyString APIs. We have done that on a smaller
>> scale a couple of times in the past (turned functions into
>> macros or vice-versa).
> 
> The PyString methods are still available and the official API for
> dealing with str objects in 2.x.
> 
>> And finally, the "merge" argument itself is not really all that
>> strong. It's just a matter of getting the procedure corrected.
>> Then you can rename and restructure as much as you want in
>> 3.x - without affecting the stability and matureness of the
>> 2.x branch.
> 
> I'm volunteering to revert my chances if you are volunteering to keep
> the Python 2.x series in sync with the 3.x series.
> 
> Christian
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/mal%40egenix.com


From qrczak at knm.org.pl  Thu May 29 20:16:28 2008
From: qrczak at knm.org.pl (=?UTF-8?Q?Marcin_=E2=80=98Qrczak=E2=80=99_Kowalczyk?=)
Date: Thu, 29 May 2008 20:16:28 +0200
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <483EAF95.5050503@trueblade.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
	<483EAF95.5050503@trueblade.com>
Message-ID: <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>

2008/5/29 Eric Smith <eric+python-dev at trueblade.com>:

> I don't see it as a big problem.  You can now use any prefix you want,
> instead of the hard coded values that # supplied.

Except that it works incorrectly for negative numbers.

-- 
Marcin Kowalczyk
qrczak at knm.org.pl
http://qrnik.knm.org.pl/~qrczak/

From guido at python.org  Thu May 29 21:19:52 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 29 May 2008 12:19:52 -0700
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805291051x3f81b716p5caadafefc4af4ed@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<ca471dc20805282147j1cefed8flc0749f95beb82214@mail.gmail.com>
	<7c8225f20805282152t2eb0a9f4s850a36079bc4fa39@mail.gmail.com>
	<fa7d4c4f0805290150u68f98a1m66963db3360e4880@mail.gmail.com>
	<7c8225f20805291051x3f81b716p5caadafefc4af4ed@mail.gmail.com>
Message-ID: <ca471dc20805291219q689b01paf12948c7b0c6093@mail.gmail.com>

On Thu, May 29, 2008 at 10:51 AM,  <allyourcode at gmail.com> wrote:
> This is in response to Stefan Behnel, who wrote
>
> ----
>
> Tutorial section on "tuples and sequences", not quite the most hidden place in
> the universe.
>
> http://docs.python.org/tut/node7.html#SECTION007300000000000000000
>
> Stefan
>
> ----
>
> I just read that section twice and no where does it mention that
> Python does what I was suggesting. In fact, by the discussion on
> sequence unpacking, it seems to imply that Python does *not* do what I
> wanted.

It implies no such thing. It says "tuples may be nested" but fails to
show an example of a nested tuple on the LHS of an assignment. I don't
see how you can draw the conclusion from this that it's not supported.

> PS: Sorry for letting all of this sarcasm get under my skin and clog
> up the mailing list, but it's really mean spirited and unnecssary.

You can't very well demand that a tutorial have examples of every
feature. Perhaps (continuing the sarcasm for a bit) you also didn't
think that tuples could be nested more than one level, since there is
no example of that? Plus, you could have tried this in the interactive
interpreter in 10 seconds rather than spawning a long thread.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From phd at phd.pp.ru  Thu May 29 21:21:57 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Thu, 29 May 2008 23:21:57 +0400
Subject: [Python-3000] PEP: str(container) should call str(item),
	not repr(item)
Message-ID: <20080529192157.GA17896@phd.pp.ru>

Hello. A draft for a discussion.

PEP: XXX
Title: str(container) should call str(item), not repr(item)
Version: $Revision$
Last-Modified: $Date$
Author: Oleg Broytmann <phd at phd.pp.ru>,
        Jim Jewett <jimjjewett at gmail.com>
Discussions-To: python-3000 at python.org
Status: Draft
Type: Standards Track
Content-Type: text/plain
Created: 27-May-2008
Post-History: 28-May-2008


Abstract

    This document discusses the advantages and disadvantages of the
    current implementation of str(container).  It also discusses the
    pros and cons of a different approach - to call str(item) instead
    of repr(item).


Motivation

    Currently str(container) calls repr on items.  Arguments for it:
    -- containers refuse to guess what the user wants to see on
       str(container) - surroundings, delimiters, and so on;
    -- repr(item) usually displays type information - apostrophes
       around strings, class names, etc.

    Arguments against:
    -- it's illogical; str() is expected to call __str__ if it exists,
       not __repr__;
    -- there is no standard way to print a container's content calling
       items' __str__, that's inconvenient in cases where __str__ and
       __repr__ return different results;
    -- repr(item) sometimes do wrong things (hex-escapes non-ascii
       strings, e.g.)

    This PEP proposes to change how str(container) works.  It is
    proposed to mimic how repr(container) works except one detail
    - call str on items instead of repr.  This allows a user to choose
    what results she want to get - from item.__repr__ or item.__str__.


Current situation

    Most container types (tuples, lists, dicts, sets, etc.) do not
    implement __str__ method, so str(container) calls
    container.__repr__, and container.__repr__, once called, forgets
    it is called from str and always calls repr on the container's
    items.

    This behaviour has advantages and disadvantages.  One advantage is
    that most items are represented with type information - strings
    are surrounded by apostrophes, instances may have both class name
    and instance data:

        >>> print([42, '42'])
        [42, '42']
        >>> print([Decimal('42'), datetime.now()])
        [Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)]

    The disadvantage is that __repr__ often returns technical data
    (like '<object at address>') or unreadable string (hex-encoded
    string if the input is non-ascii string):

        >>> print(['????'])
        ['\xd4\xc5\xd3\xd4']

    One of the motivations for PEP 3138 is that neither repr nor str
    will allow the sensible printing of dicts whose keys are non-ascii
    text strings.  Now that unicode identifiers are allowed, it
    includes Python's own attribute dicts.  This also includes JSON
    serialization (and caused some hoops for the json lib).

    PEP 3138 proposes to fix this by breaking the "repr is safe ASCII"
    invariant, and changing the way repr (which is used for
    persistence) outputs some objects, with system-dependent failures.

    Changing how str(container) works would allow easy debugging in
    the normal case, and retrain the safety of ASCII-only for the
    machine-readable  case.  The only downside is that str(x) and
    repr(x) would more often be different -- but only in those cases
    where the current almost-the-same version is insufficient.

    It also seems illogical that str(container) calls repr on items
    instead of str.  It's only logical to expect following code

        class Test:
            def __str__(self):
                return "STR"

            def __repr__(self):
                return "REPR"


        test = Test()
        print(test)
        print(repr(test))
        print([test])
        print(str([test]))

    to print

        STR
        REPR
        [STR]
        [STR]

    where it actually prints

        STR
        REPR
        [REPR]
        [REPR]

    Especially it is illogical to see that print in Python 2 uses str
    if it is called on what seems to be a tuple:

        >>> print Decimal('42'), datetime.now()
        42 2008-05-27 20:16:22.534285

    where on an actual tuple it prints

        >>> print((Decimal('42'), datetime.now()))
        (Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911))


A different approach - call str(item)

    For example, with numbers it is often only the value that people
    care about.

        >>> print Decimal('3')
        3

    But putting the value in a list forces users to read the type
    information, exactly as if repr had been called for the benefit of
    a machine:

        >>> print [Decimal('3')]
        [Decimal("3")]

    After this change, the type information would not clutter the str
    output:

        >>> print "%s".format([Decimal('3')])
        [3]
        >>> str([Decimal('3')])  # ==
        [3]

    But it would still be available if desired:

        >>> print "%r".format([Decimal('3')])
        [Decimal('3')]
        >>> repr([Decimal('3')])  # ==
        [Decimal('3')]

    There is a number of strategies to fix the problem.  The most
    radical is to change __repr__ so it accepts a new parameter (flag)
    "called from str, so call str on items, not repr".  The
    drawback of the proposal is that every __repr__ implementation
    must be changed.  Introspection could help a bit (inspect __repr__
    before calling if it accepts 2 or 3 parameters), but introspection
    doesn't work on classes written in C, like all builtin containers.

    Less radical proposal is to implement __str__ methods for builtin
    container types.  The obvious drawback is a duplication of effort
    - all those __str__ and __repr__ implementations are only differ
    in one small detail - if they call str or repr on items.

    The most conservative proposal is not to change str at all but
    to allow developers to implement their own application- or
    library-specific pretty-printers.  The drawback is again
    a multiplication of effort and proliferation of many small
    specific container-traversal algorithms.


Backward compatibility

    In those cases where type information is more important than
    usual, it will still be possible to get the current results by
    calling repr explicitly.


Copyright

    This document has been placed in the public domain.


Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End:

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From guido at python.org  Thu May 29 21:31:17 2008
From: guido at python.org (Guido van Rossum)
Date: Thu, 29 May 2008 12:31:17 -0700
Subject: [Python-3000] PEP: str(container) should call str(item),
	not repr(item)
In-Reply-To: <20080529192157.GA17896@phd.pp.ru>
References: <20080529192157.GA17896@phd.pp.ru>
Message-ID: <ca471dc20805291231k56621365ud6b05c97f0659c01@mail.gmail.com>

Let me just save everyone a lot of time and say that I'm opposed to
this change, and that I believe that it would cause way too much
disturbance to be accepted this close to beta.

--Guido

On Thu, May 29, 2008 at 12:21 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> Hello. A draft for a discussion.
>
> PEP: XXX
> Title: str(container) should call str(item), not repr(item)
> Version: $Revision$
> Last-Modified: $Date$
> Author: Oleg Broytmann <phd at phd.pp.ru>,
>        Jim Jewett <jimjjewett at gmail.com>
> Discussions-To: python-3000 at python.org
> Status: Draft
> Type: Standards Track
> Content-Type: text/plain
> Created: 27-May-2008
> Post-History: 28-May-2008
>
>
> Abstract
>
>    This document discusses the advantages and disadvantages of the
>    current implementation of str(container).  It also discusses the
>    pros and cons of a different approach - to call str(item) instead
>    of repr(item).
>
>
> Motivation
>
>    Currently str(container) calls repr on items.  Arguments for it:
>    -- containers refuse to guess what the user wants to see on
>       str(container) - surroundings, delimiters, and so on;
>    -- repr(item) usually displays type information - apostrophes
>       around strings, class names, etc.
>
>    Arguments against:
>    -- it's illogical; str() is expected to call __str__ if it exists,
>       not __repr__;
>    -- there is no standard way to print a container's content calling
>       items' __str__, that's inconvenient in cases where __str__ and
>       __repr__ return different results;
>    -- repr(item) sometimes do wrong things (hex-escapes non-ascii
>       strings, e.g.)
>
>    This PEP proposes to change how str(container) works.  It is
>    proposed to mimic how repr(container) works except one detail
>    - call str on items instead of repr.  This allows a user to choose
>    what results she want to get - from item.__repr__ or item.__str__.
>
>
> Current situation
>
>    Most container types (tuples, lists, dicts, sets, etc.) do not
>    implement __str__ method, so str(container) calls
>    container.__repr__, and container.__repr__, once called, forgets
>    it is called from str and always calls repr on the container's
>    items.
>
>    This behaviour has advantages and disadvantages.  One advantage is
>    that most items are represented with type information - strings
>    are surrounded by apostrophes, instances may have both class name
>    and instance data:
>
>        >>> print([42, '42'])
>        [42, '42']
>        >>> print([Decimal('42'), datetime.now()])
>        [Decimal("42"), datetime.datetime(2008, 5, 27, 19, 57, 43, 485028)]
>
>    The disadvantage is that __repr__ often returns technical data
>    (like '<object at address>') or unreadable string (hex-encoded
>    string if the input is non-ascii string):
>
>        >>> print(['????'])
>        ['\xd4\xc5\xd3\xd4']
>
>    One of the motivations for PEP 3138 is that neither repr nor str
>    will allow the sensible printing of dicts whose keys are non-ascii
>    text strings.  Now that unicode identifiers are allowed, it
>    includes Python's own attribute dicts.  This also includes JSON
>    serialization (and caused some hoops for the json lib).
>
>    PEP 3138 proposes to fix this by breaking the "repr is safe ASCII"
>    invariant, and changing the way repr (which is used for
>    persistence) outputs some objects, with system-dependent failures.
>
>    Changing how str(container) works would allow easy debugging in
>    the normal case, and retrain the safety of ASCII-only for the
>    machine-readable  case.  The only downside is that str(x) and
>    repr(x) would more often be different -- but only in those cases
>    where the current almost-the-same version is insufficient.
>
>    It also seems illogical that str(container) calls repr on items
>    instead of str.  It's only logical to expect following code
>
>        class Test:
>            def __str__(self):
>                return "STR"
>
>            def __repr__(self):
>                return "REPR"
>
>
>        test = Test()
>        print(test)
>        print(repr(test))
>        print([test])
>        print(str([test]))
>
>    to print
>
>        STR
>        REPR
>        [STR]
>        [STR]
>
>    where it actually prints
>
>        STR
>        REPR
>        [REPR]
>        [REPR]
>
>    Especially it is illogical to see that print in Python 2 uses str
>    if it is called on what seems to be a tuple:
>
>        >>> print Decimal('42'), datetime.now()
>        42 2008-05-27 20:16:22.534285
>
>    where on an actual tuple it prints
>
>        >>> print((Decimal('42'), datetime.now()))
>        (Decimal("42"), datetime.datetime(2008, 5, 27, 20, 16, 27, 937911))
>
>
> A different approach - call str(item)
>
>    For example, with numbers it is often only the value that people
>    care about.
>
>        >>> print Decimal('3')
>        3
>
>    But putting the value in a list forces users to read the type
>    information, exactly as if repr had been called for the benefit of
>    a machine:
>
>        >>> print [Decimal('3')]
>        [Decimal("3")]
>
>    After this change, the type information would not clutter the str
>    output:
>
>        >>> print "%s".format([Decimal('3')])
>        [3]
>        >>> str([Decimal('3')])  # ==
>        [3]
>
>    But it would still be available if desired:
>
>        >>> print "%r".format([Decimal('3')])
>        [Decimal('3')]
>        >>> repr([Decimal('3')])  # ==
>        [Decimal('3')]
>
>    There is a number of strategies to fix the problem.  The most
>    radical is to change __repr__ so it accepts a new parameter (flag)
>    "called from str, so call str on items, not repr".  The
>    drawback of the proposal is that every __repr__ implementation
>    must be changed.  Introspection could help a bit (inspect __repr__
>    before calling if it accepts 2 or 3 parameters), but introspection
>    doesn't work on classes written in C, like all builtin containers.
>
>    Less radical proposal is to implement __str__ methods for builtin
>    container types.  The obvious drawback is a duplication of effort
>    - all those __str__ and __repr__ implementations are only differ
>    in one small detail - if they call str or repr on items.
>
>    The most conservative proposal is not to change str at all but
>    to allow developers to implement their own application- or
>    library-specific pretty-printers.  The drawback is again
>    a multiplication of effort and proliferation of many small
>    specific container-traversal algorithms.
>
>
> Backward compatibility
>
>    In those cases where type information is more important than
>    usual, it will still be possible to get the current results by
>    calling repr explicitly.
>
>
> Copyright
>
>    This document has been placed in the public domain.
>
>
>
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
> End:
>
> Oleg.
> --
>     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
>           Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Thu May 29 21:41:51 2008
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 29 May 2008 15:41:51 -0400
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>	<483EAF95.5050503@trueblade.com>
	<3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>
Message-ID: <483F06FF.9090007@trueblade.com>

Marcin ?Qrczak? Kowalczyk wrote:
> 2008/5/29 Eric Smith <eric+python-dev at trueblade.com>:
> 
>> I don't see it as a big problem.  You can now use any prefix you want,
>> instead of the hard coded values that # supplied.
> 
> Except that it works incorrectly for negative numbers.

Excellent point.  If only this had been brought up back when the PEP was 
written :(

Any suggestions on how to improve the situation?  I guess we could add 
'#' back in to the format specifier.  I can't really think of any other 
way that doesn't involve converting the number to a string and then 
operating on that, just to get the sign.

I'm reasonably sure I could implement that before the beta (next 
Wednesday) if a decision is reached before this weekend.

Eric.


From stephen at xemacs.org  Thu May 29 22:00:35 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 30 May 2008 05:00:35 +0900
Subject: [Python-3000] PEP 3138- String representation in Python 3000
In-Reply-To: <fb6fbf560805271208xa425820tb1fed0e1ae1859d1@mail.gmail.com>
References: <797440730805060655j3250ab22le3220c76e5403ea4@mail.gmail.com>
	<ca471dc20805230722j54369d3bj7202a86656e601f5@mail.gmail.com>
	<797440730805230805w442df1b9wd1db265d9260f9df@mail.gmail.com>
	<87hccon44x.fsf@uwakimon.sk.tsukuba.ac.jp>
	<797440730805231904y501d310fw124ccd0e37defd3b@mail.gmail.com>
	<87d4ncm4ag.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805240953p1287cc6ai52e5336c4ef46618@mail.gmail.com>
	<878wxyn73s.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805261626u62a17a8cp683e3585e76ecc98@mail.gmail.com>
	<87k5hgsb3e.fsf@uwakimon.sk.tsukuba.ac.jp>
	<fb6fbf560805271208xa425820tb1fed0e1ae1859d1@mail.gmail.com>
Message-ID: <87y75skhi4.fsf@uwakimon.sk.tsukuba.ac.jp>

Jim Jewett writes:
 > On 5/26/08, Stephen J. Turnbull <stephen at xemacs.org> wrote:
 > > Jim Jewett writes:
 > 
 > >   > The only reason for this change is that __repr__ gets used when
 > >   > __str__ *should* be used instead.
 > 
 > > That's not what the advocates say.
 > 
 > I still haven't seen a use case where it *should* be using repr *and*
 > needs to print outside of ASCII.

I suggest that's because you rarely (if ever) read program or program
*textual* input or output that's not written in ASCII.

 > >  Now, I agree with you about what's "safe".  However, in a text-
 > >  processing application in a Japanese environment, that's hardly
 > >  useful, and our Japanese programmer can argue that in his environment,
 > >  printing all of Unicode *is* safe.
 > 
 > I think he or she will still be wrong, because of confusables -- it is
 > just that "unsafe" characters are far more rare (since byte value
 > alone isn't a problem) and the cost of not printing non-ASCII
 > characters is higher.

AFAIK confusables in strings are generally not a problem, that's part
of what I mean by "environment".  If they are, then you probably need
to set up special controls in the environment anyway, and Python
giving you Unicode escapes instead of glyphs is redundant.

 > > I don't use it myself other than as a way of diagnosing bugs in
 > >  programs I write or maintain; in personal practice, I'm in your camp.
 > >  But my understanding is that there is often an intermediate level,
 > >  such as a website admin, who needs *some* of the precision of repr()
 > >  such as escaped representation of whitespace, but also needs to be
 > >  able read most of the output.
 > 
 > Could someone who does need this explain more?

I don't think that's useful.  See below.

 > I don't understand needing *exactly* whitespace escaped, but not, say,
 > stray characters from scripts you've never used, even though the rest
 > of the page *is* in an expected script.

Of course *everybody* wants *stray* characters escaped!  The problem is
that to a Japanese, the 21000 kanji are *not* stray characters.  To a
Korean, the 21000 kanji and the 11000 Hangul are not stray
characters.  Etc.

So the first question is "can repr()'s printable repertoire usefully
be made locale-dependent?", and the answer is emphatically "no".
(I'm pretty sure that's a pronouncement from Guido, I could look it up
later.)

The next question is "what is the most useful compromise?", and the
candidates are "ASCII" and "all of Unicode".  You want the former, and
the 5.7 billion people whose native language is not American English
want the latter.  I don't know about the other 300 million
Americans.<wink>

From phd at phd.pp.ru  Thu May 29 21:57:57 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Thu, 29 May 2008 23:57:57 +0400
Subject: [Python-3000] PEP: str(container) should call str(item),
	not repr(item)
In-Reply-To: <ca471dc20805291231k56621365ud6b05c97f0659c01@mail.gmail.com>
References: <20080529192157.GA17896@phd.pp.ru>
	<ca471dc20805291231k56621365ud6b05c97f0659c01@mail.gmail.com>
Message-ID: <20080529195757.GB17896@phd.pp.ru>

On Thu, May 29, 2008 at 12:31:17PM -0700, Guido van Rossum wrote:
> Let me just save everyone a lot of time and say that I'm opposed to
> this change, and that I believe that it would cause way too much
> disturbance to be accepted this close to beta.
> 
> On Thu, May 29, 2008 at 12:21 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> > PEP: XXX
> > Title: str(container) should call str(item), not repr(item)

   That's ok. A rejected PEP has its purpose, too. It will rest peacefully
in the archive, holding all arguments consolidated and will serve as a point
of reference.
   Any objection if I demand it be properly registered, assigned a number
and then rejected?

PS. Am I the champion whose PEP has been killed before I even finished it? ;)

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From stephen at xemacs.org  Thu May 29 22:15:47 2008
From: stephen at xemacs.org (Stephen J. Turnbull)
Date: Fri, 30 May 2008 05:15:47 +0900
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<72D3D981-71D6-4A3B-8AF2-5C9D9D3A17EA@gmail.com>
	<7c8225f20805282134g624b0d10ueaff0f02b8d2f445@mail.gmail.com>
Message-ID: <87wslckgss.fsf@uwakimon.sk.tsukuba.ac.jp>

allyourcode at gmail.com writes:

 > Well, I'm sorry for bothering his majesty with such a stupid idea. At
 > least one other person didn't know about it either...
 > 
 > On 5/28/08, Mike Klaas <mike.klaas at gmail.com> wrote:

 > > I find it hard to believe that you have even attempted this, which has
 > > been valid in python for ages:

Um, stupidity (in the sense of not understanding all the implications
of the grammar) or ignorance (of the relevant section of the docs) is
not the point.  The point is that the proposed syntax (a) might
already mean something (even the semantics you suggest, in which case
you should say "d'oh, thanks!" when it is pointed out), or (b) not be
feasible for reasons that become obvious when you see the error
message that is emitted when you try it.  So try it before posting.


From musiccomposition at gmail.com  Thu May 29 22:49:08 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 29 May 2008 15:49:08 -0500
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
Message-ID: <1afaf6160805291349i6ca25d6fhef22eb6c3abd5a7d@mail.gmail.com>

On Wed, May 28, 2008 at 11:38 PM, Brett Cannon <brett at python.org> wrote:
>
> Issue 2854 - gestalt needs to be added back into 3.0. This is
> Benjamin's issue. =)

Is that your way of say "Check in the patch!" ? :)


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From brett at python.org  Thu May 29 23:01:52 2008
From: brett at python.org (Brett Cannon)
Date: Thu, 29 May 2008 14:01:52 -0700
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <1afaf6160805291349i6ca25d6fhef22eb6c3abd5a7d@mail.gmail.com>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
	<1afaf6160805291349i6ca25d6fhef22eb6c3abd5a7d@mail.gmail.com>
Message-ID: <bbaeab100805291401p17749628ucb6b997825369314@mail.gmail.com>

On Thu, May 29, 2008 at 1:49 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Wed, May 28, 2008 at 11:38 PM, Brett Cannon <brett at python.org> wrote:
>>
>> Issue 2854 - gestalt needs to be added back into 3.0. This is
>> Benjamin's issue. =)
>
> Is that your way of say "Check in the patch!" ? :)
>

More or less; specifically, "don't forget to do this." =)

-Brett

From musiccomposition at gmail.com  Thu May 29 23:04:43 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Thu, 29 May 2008 16:04:43 -0500
Subject: [Python-3000] PEP: str(container) should call str(item),
	not repr(item)
In-Reply-To: <20080529195757.GB17896@phd.pp.ru>
References: <20080529192157.GA17896@phd.pp.ru>
	<ca471dc20805291231k56621365ud6b05c97f0659c01@mail.gmail.com>
	<20080529195757.GB17896@phd.pp.ru>
Message-ID: <1afaf6160805291404j3624a914hd08897303a79c9da@mail.gmail.com>

On Thu, May 29, 2008 at 2:57 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> On Thu, May 29, 2008 at 12:31:17PM -0700, Guido van Rossum wrote:
>> Let me just save everyone a lot of time and say that I'm opposed to
>> this change, and that I believe that it would cause way too much
>> disturbance to be accepted this close to beta.
>>
>> On Thu, May 29, 2008 at 12:21 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
>> > PEP: XXX
>> > Title: str(container) should call str(item), not repr(item)
>
>   That's ok. A rejected PEP has its purpose, too. It will rest peacefully
> in the archive, holding all arguments consolidated and will serve as a point
> of reference.
>   Any objection if I demand it be properly registered, assigned a number
> and then rejected?

I've added it for you. See r63794.

>
> PS. Am I the champion whose PEP has been killed before I even finished it? ;)

Probably not. :)
>
> Oleg.
> --
>     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
>           Programmers don't die, they just GOSUB without RETURN.
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/musiccomposition%40gmail.com
>


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From phd at phd.pp.ru  Thu May 29 23:16:58 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Fri, 30 May 2008 01:16:58 +0400
Subject: [Python-3000] PEP 3140: str(container) should call str(item),
	not repr(item)s
In-Reply-To: <1afaf6160805291404j3624a914hd08897303a79c9da@mail.gmail.com>
References: <20080529192157.GA17896@phd.pp.ru>
	<ca471dc20805291231k56621365ud6b05c97f0659c01@mail.gmail.com>
	<20080529195757.GB17896@phd.pp.ru>
	<1afaf6160805291404j3624a914hd08897303a79c9da@mail.gmail.com>
Message-ID: <20080529211658.GB19274@phd.pp.ru>

On Thu, May 29, 2008 at 04:04:43PM -0500, Benjamin Peterson wrote:
> On Thu, May 29, 2008 at 2:57 PM, Oleg Broytmann <phd at phd.pp.ru> wrote:
> >   Any objection if I demand it be properly registered, assigned a number
> > and then rejected?
> 
> I've added it for you. See r63794.

   Thank you!

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From greg.ewing at canterbury.ac.nz  Fri May 30 00:26:12 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 30 May 2008 10:26:12 +1200
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
Message-ID: <483F2D84.10604@canterbury.ac.nz>

Daniel Wong wrote:

> Are there plans for introducing syntax like this:
> 
> (a, (b[2], c)) = ('big' ('red', 'dog'))

I think you'll find Guido has made another trip
in the time machine for this one:

Python 2.3 (#1, Aug  5 2003, 15:52:30)
[GCC 3.1 20020420 (prerelease)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
 >>> b = [0,1,2]
 >>> (a, (b[2], c)) = ('big', ('red', 'dog'))
 >>> a
'big'
 >>> b
[0, 1, 'red']
 >>> c
'dog'
 >>>

-- 
Greg

From ncoghlan at gmail.com  Fri May 30 00:57:07 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 30 May 2008 08:57:07 +1000
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483EF139.8000606@egenix.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D300B.5090309@egenix.com>	<52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>	<483ECA52.6040000@egenix.com>	<483ECF94.7060607@cheimes.de>
	<483EF139.8000606@egenix.com>
Message-ID: <483F34C3.3050402@gmail.com>

M.-A. Lemburg wrote:
> * Why can't we have both PyString *and* PyBytes exposed in 2.x,
> with one redirecting to the other ?

We do have that - the PyString_* names still work perfectly fine in 2.x. 
They just won't be used in the Python core codebase anymore - everything 
in the Python core will use either PyBytes_* or PyUnicode_* regardless 
of which branch (2.x or 3.x) you're working on. I think that's a good 
thing for ease of maintenance in the future, even if it takes people a 
while to get their heads around it right now.

> * Why should the 2.x code base turn to hacks, just because 3.x wants
> to restructure itself ?

With the better explanation from Greg of what the checked in approach 
achieves (i.e. preserving exact ABI compatibility for PyString_*, while 
allowing PyBytes_* to be used at the source code level), I don't see 
what has been done as being any more of a hack than the possibly more 
common "#define <oldname> <newname>" (which *would* break binary 
compatibility).

The only things that I think would tidy it up further would be to:
- include an explanation of the approach and its effects on API and ABI 
backward and forward compatibility within 2.x and between 2.x and 3.x in 
stringobject.h
- expose the PyBytes_* functions to the linker in 2.6 as well as 3.0

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From ncoghlan at gmail.com  Fri May 30 01:10:23 2008
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Fri, 30 May 2008 09:10:23 +1000
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <483F06FF.9090007@trueblade.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>	<483EAF95.5050503@trueblade.com>	<3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>
	<483F06FF.9090007@trueblade.com>
Message-ID: <483F37DF.5060507@gmail.com>

Eric Smith wrote:
> Marcin ?Qrczak? Kowalczyk wrote:
>> 2008/5/29 Eric Smith <eric+python-dev at trueblade.com>:
>>
>>> I don't see it as a big problem.  You can now use any prefix you want,
>>> instead of the hard coded values that # supplied.
>>
>> Except that it works incorrectly for negative numbers.
> 
> Excellent point.  If only this had been brought up back when the PEP was 
> written :(
> 
> Any suggestions on how to improve the situation?  I guess we could add 
> '#' back in to the format specifier.  I can't really think of any other 
> way that doesn't involve converting the number to a string and then 
> operating on that, just to get the sign.
> 
> I'm reasonably sure I could implement that before the beta (next 
> Wednesday) if a decision is reached before this weekend.

Doing the right thing for negative numbers is a good point. It also 
means the prefix can be handled properly when dealing with aligned 
fields. The following update to the standard format specifier in the PEP:

   [[fill]align][#][sign][0][minimumwidth][.precision][type]

   The '#' prefix option inserts the appropriate prefix characters 
('0b', '0o', '0x', '0X') when displaying numbers in binary, octal or 
hexadecimal formats. The prefix is inserted into the displayed number 
after the sign character and fill characters (if any), but before any 
leading zeroes.

Cheers,
Nick.
-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From wescpy at gmail.com  Fri May 30 01:34:39 2008
From: wescpy at gmail.com (wesley chun)
Date: Thu, 29 May 2008 16:34:39 -0700
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <483F06FF.9090007@trueblade.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
	<483EAF95.5050503@trueblade.com>
	<3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>
	<483F06FF.9090007@trueblade.com>
Message-ID: <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com>

On 5/29/08, Eric Smith <eric+python-dev at trueblade.com> wrote:
> Marcin 'Qrczak' Kowalczyk wrote:
> > Except that it works incorrectly for negative numbers.

wow, that is a great point.  i didn't think of this either. it makes
it very inconvenient (see below) and makes it more difficult to say
we've completed replaced the '%' operator.


>  I can't really think of any other way that doesn't involve converting the
> number to a string and then operating on that, just to get the sign.

here's one way of doing it without converting to a string first (it's ugly too):

>>> i = -45
>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i))
'-0x2d'

thx for putting it (back) in,
-wesley

From humberto at digi.com.br  Fri May 30 02:25:21 2008
From: humberto at digi.com.br (Humberto Diogenes)
Date: Thu, 29 May 2008 21:25:21 -0300
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
	<g1ll0v$i0p$1@ger.gmane.org>
	<bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>
Message-ID: <21258346-305C-439D-A7F0-EE945FE77B37@digi.com.br>


On 29/05/2008, at 14:32, Brett Cannon wrote:

> On Thu, May 29, 2008 at 12:12 AM, Georg Brandl <g.brandl at gmx.net>  
> wrote:
>>
>>> Issue 2848 - mimetools has been deprecated for a while, but it is
>>> still used in a bunch of places. Since this has been deprecated in  
>>> PEP
>>> 4 for a long time, should we add the removal warning in 2.6 now and
>>> then make its actual removal of usage something to do by another  
>>> beta?
>>>
>>> Issue 2849 - rfc822 is the same problem as mimetools.
>>
>> The problem is that nobody seems to know what exactly distinguishes
>> mimetools/rfc822' classes and its successor's (email's) classes, so
>> it's hard to replace it in the stdlib.
>>
>
> Right. I have looked myself over the years and it never seemed
> brain-dead simple.


Well, as documented in issue 2849, rfc822 is almost gone. I've already  
removed it from mailbox and test_urllib2 modules. It seems that there  
remains only one important use of it, which is in  
cgi.FieldStorage.read_multi().

I couldn't figure out how to replace it there, though, as read_multi's  
current implementation relies on the fact that rfc822.Message(fp)  
advances the file pointer just by the amount it needs, while  
email.parser.Parser.parse() reads the whole file.

I believe that read_multi can be rewritten in a way that's compatible  
with email.parser, but I don't know how to do that... :\

--
Humberto Di?genes
http://humberto.digi.com.br


From eric+python-dev at trueblade.com  Fri May 30 02:45:46 2008
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Thu, 29 May 2008 20:45:46 -0400
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>	
	<483EAF95.5050503@trueblade.com>	
	<3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>	
	<483F06FF.9090007@trueblade.com>
	<78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com>
Message-ID: <483F4E3A.9090403@trueblade.com>

wesley chun wrote:
> On 5/29/08, Eric Smith <eric+python-dev at trueblade.com> wrote:
>> Marcin 'Qrczak' Kowalczyk wrote:
>>> Except that it works incorrectly for negative numbers.
> 
> wow, that is a great point.  i didn't think of this either. it makes
> it very inconvenient (see below) and makes it more difficult to say
> we've completed replaced the '%' operator.
> 
> 
>>  I can't really think of any other way that doesn't involve converting the
>> number to a string and then operating on that, just to get the sign.
> 
> here's one way of doing it without converting to a string first (it's ugly too):
> 
>>>> i = -45
>>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i))
> '-0x2d'

Agreed, ick!

> thx for putting it (back) in,

I didn't say I would, I said I would if a decision was reached :)  I'd 
like to see some more consensus, and I hope that Talin (the PEP author) 
chimes in.

Eric.


From greg.ewing at canterbury.ac.nz  Fri May 30 03:07:47 2008
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 30 May 2008 13:07:47 +1200
Subject: [Python-3000] suggestion: structured assignment
In-Reply-To: <7c8225f20805291744n62431d93y1c0c484d0387a01d@mail.gmail.com>
References: <7c8225f20805281823y4b030a5fh5231f74ca4287f4@mail.gmail.com>
	<483F2D84.10604@canterbury.ac.nz>
	<7c8225f20805291744n62431d93y1c0c484d0387a01d@mail.gmail.com>
Message-ID: <483F5363.6030409@canterbury.ac.nz>

Daniel Wong wrote:
> Ironic that you should mention it. He already mentioned it.

The time machine thing is pretty much a standard
joke in the Python community, which goes to show
how common it is for people to be pleasantly
surprised by what Python already does.

I think everyone's being a bit hard on Mr. Wong
here. When you're new to Python, you don't always
realise at first how deep and subtle a thing
you're dealing with. I was guilty of failing to
follow the try-it-first rule myself in the early
days. I soon learned better.

In fact, it works the other way too -- the
transcript I posted was the result of me thinking
"Doesn't that already work? Hang on, I'd better
try it first to make sure it really does..."
:-)

-- 
Greg

From greg at krypto.org  Fri May 30 09:45:25 2008
From: greg at krypto.org (Gregory P. Smith)
Date: Fri, 30 May 2008 00:45:25 -0700
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483F34C3.3050402@gmail.com>
References: <48397ECC.9070805@cheimes.de> <483AD138.7000804@egenix.com>
	<483B2D02.8040400@cheimes.de> <483BDE11.509@egenix.com>
	<483D300B.5090309@egenix.com>
	<52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>
	<483ECA52.6040000@egenix.com> <483ECF94.7060607@cheimes.de>
	<483EF139.8000606@egenix.com> <483F34C3.3050402@gmail.com>
Message-ID: <52dc1c820805300045g5c37256em31de5f5d76dc365b@mail.gmail.com>

On Thu, May 29, 2008 at 3:57 PM, Nick Coghlan <ncoghlan at gmail.com> wrote:
> M.-A. Lemburg wrote:
>> * Why should the 2.x code base turn to hacks, just because 3.x wants
>> to restructure itself ?
>
> With the better explanation from Greg of what the checked in approach
> achieves (i.e. preserving exact ABI compatibility for PyString_*, while
> allowing PyBytes_* to be used at the source code level), I don't see what
> has been done as being any more of a hack than the possibly more common
> "#define <oldname> <newname>" (which *would* break binary compatibility).
>
> The only things that I think would tidy it up further would be to:
> - include an explanation of the approach and its effects on API and ABI
> backward and forward compatibility within 2.x and between 2.x and 3.x in
> stringobject.h
> - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0

Yes that is the only complaint I believe I really see left at this
point.  It is easy enough to fix.

Change the current stringobject.h "#define PyBytes_Foo PyString_Foo"
approach into a .c file that defines one line stub functions for all
PyString_Foo() functions to call actual PyBytes_Foo() functions.

I'd even go so far as to put the one line alternate name stubs in the
Objects/bytesobject.c and .h file right next to the PyBytes_Foo()
method definitions so that its clear from reading a single file that
they are the same thing.

The performance implications of this are minor all things considered
(a single absolute jmp given a good compiler) and regardless of what
we do should only apply to extension modules, not the core.

If we do the above in trunk will this thread end?

I'm personally not really clear on why we need PyBytes_Foo to show up
in the -binary- ABI in 2.6.  The #define's are enough for me but I'm
happy to make this compromise.

No 2.x books, documentation or literature will be invalidated by the
changes regardless.

-gps

From oliphant.travis at ieee.org  Fri May 30 10:21:59 2008
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Fri, 30 May 2008 03:21:59 -0500
Subject: [Python-3000] Single buffer implied in new buffer protocol?
In-Reply-To: <g1lhlt$988$1@ger.gmane.org>
References: <g1675f$hf4$1@ger.gmane.org> <g1hvto$kh3$1@ger.gmane.org>
	<g1lhlt$988$1@ger.gmane.org>
Message-ID: <g1odf8$ga8$1@ger.gmane.org>

Stefan Behnel wrote:
> Travis Oliphant wrote:
>> Stefan Behnel wrote:
>>> Anyway, my point is that this part of the protocol actually implies
>>> setting a
>>> lock on the buffer *provider* rather than the buffer itself, as the
>>> buffer
>>> provider cannot distinguish between different buffers based on a NULL
>>> pointer
>> Yes, the language in the PEP could be more clear.   Obviously, if you
>> haven't provided a Py_buffer structure to fill in, then you are only
>> asking to lock the object's buffer from other access.
> 
> That's what I'm questioning below.
>

I see what you are referring to.  The protocol to lock the buffer after 
requesting and obtaining one was not well thought out.  I think the use 
case I had in mind was locking in the buffer before actually getting it.

Once you have a buffer, I see how you may want to lock the buffer after 
getting it.   For example, I could see how you may want to go from a 
non-locked read/write where you are guaranteed by the object that it 
won't move the memory but not that someone hasn't written to the memory 
area to an exclusive write-lock where no-one else can write to the area 
until you are done.

This should be clarified in the PEP.  Can you take a stab at it?

-Travis


From mal at egenix.com  Fri May 30 10:37:08 2008
From: mal at egenix.com (M.-A. Lemburg)
Date: Fri, 30 May 2008 10:37:08 +0200
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <483F34C3.3050402@gmail.com>
References: <48397ECC.9070805@cheimes.de>	<483ABB23.6050900@egenix.com>	<483ABDCF.8000105@cheimes.de>	<483AD138.7000804@egenix.com>	<483B2D02.8040400@cheimes.de>	<483BDE11.509@egenix.com>	<483D300B.5090309@egenix.com>	<52dc1c820805281347n75a4baax9222b75c8fa09ec5@mail.gmail.com>	<483ECA52.6040000@egenix.com>	<483ECF94.7060607@cheimes.de>	<483EF139.8000606@egenix.com>
	<483F34C3.3050402@gmail.com>
Message-ID: <483FBCB4.5020007@egenix.com>

On 2008-05-30 00:57, Nick Coghlan wrote:
> M.-A. Lemburg wrote:
>> * Why can't we have both PyString *and* PyBytes exposed in 2.x,
>> with one redirecting to the other ?
> 
> We do have that - the PyString_* names still work perfectly fine in 2.x. 
> They just won't be used in the Python core codebase anymore - everything 
> in the Python core will use either PyBytes_* or PyUnicode_* regardless 
> of which branch (2.x or 3.x) you're working on. I think that's a good 
> thing for ease of maintenance in the future, even if it takes people a 
> while to get their heads around it right now.

Sorry, I probably wasn't clear enough:

Why can't we have both PyString *and* PyBytes exposed as C
APIs (ie. visible in code and in the linker) in 2.x, with one redirecting
to the other ?

>> * Why should the 2.x code base turn to hacks, just because 3.x wants
>> to restructure itself ?
> 
> With the better explanation from Greg of what the checked in approach 
> achieves (i.e. preserving exact ABI compatibility for PyString_*, while 
> allowing PyBytes_* to be used at the source code level), I don't see 
> what has been done as being any more of a hack than the possibly more 
> common "#define <oldname> <newname>" (which *would* break binary 
> compatibility).
> 
> The only things that I think would tidy it up further would be to:
> - include an explanation of the approach and its effects on API and ABI 
> backward and forward compatibility within 2.x and between 2.x and 3.x in 
> stringobject.h
> - expose the PyBytes_* functions to the linker in 2.6 as well as 3.0

Which is what I was suggesting all along; sorry if I wasn't
clear enough on that.

The standard approach is that you provide #define redirects from the
old APIs to the new ones (which are then picked up by the compiler)
*and* add function wrappers to the same affect (to make linkers,
dynamic load APIs such ctypes and debuggers happy).


Example from pythonrun.h|c:
---------------------------

/* Use macros for a bunch of old variants */
#define PyRun_String(str, s, g, l) PyRun_StringFlags(str, s, g, l, NULL)

/* Deprecated C API functions still provided for binary compatiblity */

#undef PyRun_String
PyAPI_FUNC(PyObject *)
PyRun_String(const char *str, int s, PyObject *g, PyObject *l)
{
	return PyRun_StringFlags(str, s, g, l, NULL);
}


I still believe that we should *not* make "easy of merging" the
primary motivation for backporting changes in 3.x to 2.x. Software
design should not be guided by restrictions in the tool chain,
if not absolutely necessary.

The main argument for a backport needs to be general usefulness
to the 2.x users, IMHO... just like any other feature that
makes it into 2.x.

If merging is difficult then this needs to be addressed, but
there are more options to that than always going back to the
original 2.x trunk code. I've given a few suggestions on how
this could be approached in other emails on this thread.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, May 30 2008)
 >>> Python/Zope Consulting and Support ...        http://www.egenix.com/
 >>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 >>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2008-07-07: EuroPython 2008, Vilnius, Lithuania            37 days to go

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


    eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
     D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
            Registered at Amtsgericht Duesseldorf: HRB 46611

From jgennis at gmail.com  Fri May 30 11:13:08 2008
From: jgennis at gmail.com (Jamie Gennis)
Date: Fri, 30 May 2008 02:13:08 -0700
Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC)
In-Reply-To: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1>
References: <loom.20080527T192243-415@post.gmane.org>
	<ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com>
	<e5fff6640805271309ybe102d3y3526f0e32141df2@mail.gmail.com>
	<fb6fbf560805271540s9bd4eb6oaa53a3f79dfef688@mail.gmail.com>
	<1791A55949334749A9DF448461C9B11A@RaymondLaptop1>
Message-ID: <e90059230805300213y72f26092y9943ba9ff94c619@mail.gmail.com>

Perhaps drawing a distinction between containers (or maybe "collections"?),
and non-container iterables is appropriate?  I would define containers as
objects that can be iterated over multiple times and for which iteration
does not instantiate new objects.  By this definition generators would not
be considered containers (but views would), and for practicality it may be
worth also having an ABC for containers-and-generators (no idea what to name
it).  This would result in the following hierarchy:

iterables
- strings, bytes, etc.
- containers-and-generators
- - containers
- - - tuple, list, set, dict views, etc.
- - generators

I don't think there needs to be different operations defined for the
different ABCs.  They're all just iterables with different iteration
semantics.

Jamie

On Tue, May 27, 2008 at 3:54 PM, Raymond Hettinger <python at rcn.com> wrote:

> "Jim Jewett"
>
>> It isn't really stringiness that matters, it is that you have to
>> terminate even though you still have an iterable container.
>>
>
> Well said.
>
>
>  Guido had at least a start in Searchable, back when ABC
>> were still in the sandbox:
>>
>
> Have to disagree here.  An object cannot know in general
> whether a flattener wants to split it or not.  That is an application
> dependent decision.  A better answer is be able to tell the
> flattener what should be considered atomic in a given circumstance.
>
>
> Raymond
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/jgennis%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080530/1eefcf17/attachment.htm>

From solipsis at pitrou.net  Fri May 30 11:51:45 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 30 May 2008 09:51:45 +0000 (UTC)
Subject: [Python-3000] Exception re-raising woes
References: <loom.20080526T091655-961@post.gmane.org>
Message-ID: <loom.20080530T094609-317@post.gmane.org>


Hi,

I'm surprised that nobody except Robert Brewer reacted to my proposal. The two
relevant bugs (#2507 and #2833) have been marked respectively as "critical" and
"release blocker", so I thought at least some people felt concerned :-)

Should I wait a bit for people to react and give a qualified opinion, or should
I assume one of the following implicit answers (and if so, which one!):

- we don't really care about re-raising, just fix #2507 the simple way so that
exception state is properly cleaned up
- we must fix both #2507 and #2833 in a clean way, and your proposal looks fine
- we must fix both #2507 and #2833 in a clean way, but your proposal is
completely bogus

cheers

Antoine.


From g.brandl at gmx.net  Fri May 30 14:19:23 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 30 May 2008 14:19:23 +0200
Subject: [Python-3000] urllib.quote/unquote behavior?
Message-ID: <g1orce$10m$1@ger.gmane.org>

Hi,

Python 3.0's urllib.quote() and unquote() handle non-ASCII data strangely.
quote() encodes characters with codepoint < 256 using latin-1, but others
using utf-8. unquote() decodes everything using latin-1.

Is the correct behavior to always use utf-8?

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From phd at phd.pp.ru  Fri May 30 14:42:30 2008
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Fri, 30 May 2008 16:42:30 +0400
Subject: [Python-3000] urllib.quote/unquote behavior?
In-Reply-To: <g1orce$10m$1@ger.gmane.org>
References: <g1orce$10m$1@ger.gmane.org>
Message-ID: <20080530124230.GA32657@phd.pp.ru>

On Fri, May 30, 2008 at 02:19:23PM +0200, Georg Brandl wrote:
> Python 3.0's urllib.quote() and unquote() handle non-ASCII data strangely.
> quote() encodes characters with codepoint < 256 using latin-1, but others
> using utf-8. unquote() decodes everything using latin-1.
> 
> Is the correct behavior to always use utf-8?

   Always UTF-8. See
http://en.wikipedia.org/wiki/Percent-encoding#Current_standard

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From solipsis at pitrou.net  Fri May 30 16:07:32 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 30 May 2008 14:07:32 +0000 (UTC)
Subject: [Python-3000] urllib.quote/unquote behavior?
References: <g1orce$10m$1@ger.gmane.org> <20080530124230.GA32657@phd.pp.ru>
Message-ID: <loom.20080530T140416-422@post.gmane.org>

Oleg Broytmann <phd <at> phd.pp.ru> writes:
> On Fri, May 30, 2008 at 02:19:23PM +0200, Georg Brandl wrote:
> > Python 3.0's urllib.quote() and unquote() handle non-ASCII data strangely.
> > quote() encodes characters with codepoint < 256 using latin-1, but others
> > using utf-8. unquote() decodes everything using latin-1.
> > 
> > Is the correct behavior to always use utf-8?
> 
>    Always UTF-8. See
> http://en.wikipedia.org/wiki/Percent-encoding#Current_standard

Well, according to your link things are not that simple:
""" This requirement was introduced in January 2005 with the publication of RFC
3986. URI schemes introduced before this date are not affected. """

Practically, in the particular case of HTTP, you must probably distinguish
between the file path part (before the ? sign) and the query string part (after
the ? sign). The file path percent-encoding may depend on the actual filesystem
encoding, or the Web server configuration. The query string percent-encoding may
depend on the actual Web application being queried, or the programming language
in which it's written, or anything else altogether :-)

Regards

Antoine.


From divinekid at gmail.com  Fri May 30 16:57:45 2008
From: divinekid at gmail.com (Haoyu Bai)
Date: Fri, 30 May 2008 22:57:45 +0800
Subject: [Python-3000] Any plan to export PyInstanceMethod?
Message-ID: <484015E9.1060206@gmail.com>

Hello,

As I can see that there is a PyInstanceMethod_New C API in Python 3, 
which is a replacement of the old new.instancemethod. However, it is not 
exported to Python namespace such as builtin or other module currently.

So I am curious that is there any plan to export it?

Thank you!

Best regards,

Haoyu Bai
5/30/2008

From rrr at ronadam.com  Fri May 30 17:50:08 2008
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 30 May 2008 10:50:08 -0500
Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC)
In-Reply-To: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1>
References: <loom.20080527T192243-415@post.gmane.org><ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com><e5fff6640805271309ybe102d3y3526f0e32141df2@mail.gmail.com>	<fb6fbf560805271540s9bd4eb6oaa53a3f79dfef688@mail.gmail.com>
	<1791A55949334749A9DF448461C9B11A@RaymondLaptop1>
Message-ID: <48402230.3020505@ronadam.com>


Raymond Hettinger wrote:
> "Jim Jewett"
>> It isn't really stringiness that matters, it is that you have to
>> terminate even though you still have an iterable container.
> 
> Well said.
> 
> 
>> Guido had at least a start in Searchable, back when ABC
>> were still in the sandbox:
> 
> Have to disagree here.  An object cannot know in general
> whether a flattener wants to split it or not.  That is an application
> dependent decision.  A better answer is be able to tell the
> flattener what should be considered atomic in a given circumstance.
> 
> 
> Raymond

A while back (a couple of years I think), we had a discussion on 
python-list about flatten in which I posted the following version of a 
flatten function. It turned out to be nearly twice as fast as any other 
version.


def flatten(L):
     """ Flatten a list in place. """
     i = 0
     while i < len(L):
         while type(L[i]) is list:
             L[i:i+1] = L[i]
         i += 1
     return L


For this to work the object to be flattened needs to be both mutable and 
list like.  At the moment I can't think of any reason I would want to 
flatten anything that was not list like.


To make it a bit more flexible it could be changed just a bit.


def flatten(L):
     """ Flatten a list in place. """
     objtype = type(L)
     i = 0
     while i < len(L):
         while type(L[i]) is objtype:
             L[i:i+1] = L[i]
         i += 1
     return L


Generally, I don't think you would want to flatten dissimilar objects.

Cheers,
    Ron


From rrr at ronadam.com  Fri May 30 17:50:08 2008
From: rrr at ronadam.com (Ron Adam)
Date: Fri, 30 May 2008 10:50:08 -0500
Subject: [Python-3000] [Python-Dev] Iterable String Redux (aka StringABC)
In-Reply-To: <1791A55949334749A9DF448461C9B11A@RaymondLaptop1>
References: <loom.20080527T192243-415@post.gmane.org><ca471dc20805271242q63e80104mfad81d1b1014df60@mail.gmail.com><e5fff6640805271309ybe102d3y3526f0e32141df2@mail.gmail.com>	<fb6fbf560805271540s9bd4eb6oaa53a3f79dfef688@mail.gmail.com>
	<1791A55949334749A9DF448461C9B11A@RaymondLaptop1>
Message-ID: <48402230.3020505@ronadam.com>


Raymond Hettinger wrote:
> "Jim Jewett"
>> It isn't really stringiness that matters, it is that you have to
>> terminate even though you still have an iterable container.
> 
> Well said.
> 
> 
>> Guido had at least a start in Searchable, back when ABC
>> were still in the sandbox:
> 
> Have to disagree here.  An object cannot know in general
> whether a flattener wants to split it or not.  That is an application
> dependent decision.  A better answer is be able to tell the
> flattener what should be considered atomic in a given circumstance.
> 
> 
> Raymond

A while back (a couple of years I think), we had a discussion on 
python-list about flatten in which I posted the following version of a 
flatten function. It turned out to be nearly twice as fast as any other 
version.


def flatten(L):
     """ Flatten a list in place. """
     i = 0
     while i < len(L):
         while type(L[i]) is list:
             L[i:i+1] = L[i]
         i += 1
     return L


For this to work the object to be flattened needs to be both mutable and 
list like.  At the moment I can't think of any reason I would want to 
flatten anything that was not list like.


To make it a bit more flexible it could be changed just a bit.


def flatten(L):
     """ Flatten a list in place. """
     objtype = type(L)
     i = 0
     while i < len(L):
         while type(L[i]) is objtype:
             L[i:i+1] = L[i]
         i += 1
     return L


Generally, I don't think you would want to flatten dissimilar objects.

Cheers,
    Ron


From guido at python.org  Fri May 30 19:30:53 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 30 May 2008 10:30:53 -0700
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <loom.20080530T094609-317@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
Message-ID: <ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>

The issue you're raising is deep. subtle and complex -- I can't quite
fathom your proposal, and expect I'd have to spend at least an hour
with the source code before I could truly understand the issue and the
proposal. I haven't done that yet, so take the following with a grain
of salt.

That said, it seems you are proposing taking the logical consequence
of making except handlers properly nested and scoped, and if you can
come up with a patch to implement this, I think I could support it.

I would be okay as well with restricting bare raise syntactically to
appearing only inside an except block, to emphasize the change in
semantics that was started when we decided to make the optional
variable disappear at the end of the except block.

This would render the following code illegal:

def f():
  try: 1/0
  except: pass
  raise

I am fine with that, even if there are probably some uses of it that
may be a little tricky to rewrite. (The same happened when we reduced
the variable scope.)

--Guido

On Fri, May 30, 2008 at 2:51 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hi,
>
> I'm surprised that nobody except Robert Brewer reacted to my proposal. The two
> relevant bugs (#2507 and #2833) have been marked respectively as "critical" and
> "release blocker", so I thought at least some people felt concerned :-)
>
> Should I wait a bit for people to react and give a qualified opinion, or should
> I assume one of the following implicit answers (and if so, which one!):
>
> - we don't really care about re-raising, just fix #2507 the simple way so that
> exception state is properly cleaned up
> - we must fix both #2507 and #2833 in a clean way, and your proposal looks fine
> - we must fix both #2507 and #2833 in a clean way, but your proposal is
> completely bogus
>
> cheers
>
> Antoine.
>
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe: http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Fri May 30 19:40:34 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 30 May 2008 11:40:34 -0600
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <loom.20080530T094609-317@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
Message-ID: <aac2c7cb0805301040s78839bb9y5610f98464651f25@mail.gmail.com>

On Fri, May 30, 2008 at 3:51 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
>
> Hi,
>
> I'm surprised that nobody except Robert Brewer reacted to my proposal. The two
> relevant bugs (#2507 and #2833) have been marked respectively as "critical" and
> "release blocker", so I thought at least some people felt concerned :-)

Flip side of the bikeshed effect.  Nobody feels confident in their
understanding so nobody comments.


> Should I wait a bit for people to react and give a qualified opinion, or should
> I assume one of the following implicit answers (and if so, which one!):
>
> - we don't really care about re-raising, just fix #2507 the simple way so that
> exception state is properly cleaned up
> - we must fix both #2507 and #2833 in a clean way, and your proposal looks fine
> - we must fix both #2507 and #2833 in a clean way, but your proposal is
> completely bogus

I'd like if a bare "raise" became purely lexical (as Guido just
suggested), ditching all the magic.

However, things such as pdb.pm() still need access to the last
exception.  Maybe we can pare it down the bare minimum, a per-thread
last_exception?  That'd quickly get clobbered (we should intentionally
clear when leaving an except block), but is that ever a problem?


-- 
Adam Olsen, aka Rhamphoryncus

From solipsis at pitrou.net  Fri May 30 20:10:31 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 30 May 2008 18:10:31 +0000 (UTC)
Subject: [Python-3000] Exception re-raising woes
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
Message-ID: <loom.20080530T175853-576@post.gmane.org>


Hello,

Guido van Rossum <guido <at> python.org> writes:
> 
> That said, it seems you are proposing taking the logical consequence
> of making except handlers properly nested and scoped,

It's exactly that.

> I would be okay as well with restricting bare raise syntactically to
> appearing only inside an except block, to emphasize the change in
> semantics that was started when we decided to make the optional
> variable disappear at the end of the except block.
> 
> This would render the following code illegal:
> 
> def f():
>   try: 1/0
>   except: pass
>   raise

Please note as well that:

def f():
  try: 1/0
  except: pass
  return sys.exc_info()

would return (None, None, None).
Actually, it already does with the patch I proposed for #2507, and the test
suite runs fine after fixing a problem in doctest.py.

Regards

Antoine.


From solipsis at pitrou.net  Fri May 30 20:16:20 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 30 May 2008 18:16:20 +0000 (UTC)
Subject: [Python-3000] Exception re-raising woes
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<aac2c7cb0805301040s78839bb9y5610f98464651f25@mail.gmail.com>
Message-ID: <loom.20080530T181056-760@post.gmane.org>

Adam Olsen <rhamph <at> gmail.com> writes:
> I'd like if a bare "raise" became purely lexical (as Guido just
> suggested), ditching all the magic.
> 
> However, things such as pdb.pm() still need access to the last
> exception.  Maybe we can pare it down the bare minimum, a per-thread
> last_exception?  That'd quickly get clobbered (we should intentionally
> clear when leaving an except block),

Well, the plan is to keep storing the current exception state in the
thread state structure, so sys.exc_info() would still work fine until we
leave the exception block.


From guido at python.org  Fri May 30 20:28:59 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 30 May 2008 11:28:59 -0700
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <aac2c7cb0805301040s78839bb9y5610f98464651f25@mail.gmail.com>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<aac2c7cb0805301040s78839bb9y5610f98464651f25@mail.gmail.com>
Message-ID: <ca471dc20805301128m53b42a33tfdd6ebb61f336a6d@mail.gmail.com>

On Fri, May 30, 2008 at 10:40 AM, Adam Olsen <rhamph at gmail.com> wrote:
> I'd like if a bare "raise" became purely lexical (as Guido just
> suggested), ditching all the magic.
>
> However, things such as pdb.pm() still need access to the last
> exception.  Maybe we can pare it down the bare minimum, a per-thread
> last_exception?  That'd quickly get clobbered (we should intentionally
> clear when leaving an except block), but is that ever a problem?

No, pdb.pm() uses sys.last_*, not sys.exc_*. This is three variables
set only when an unhandled exception reaches the interactive prompt
and prints a traceback there. So no worries.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Fri May 30 20:31:22 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 30 May 2008 11:31:22 -0700
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <loom.20080530T175853-576@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
	<loom.20080530T175853-576@post.gmane.org>
Message-ID: <ca471dc20805301131p5e7a692fs266a2c0e9c0e4296@mail.gmail.com>

On Fri, May 30, 2008 at 11:10 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Guido van Rossum <guido <at> python.org> writes:
>>
>> That said, it seems you are proposing taking the logical consequence
>> of making except handlers properly nested and scoped,
>
> It's exactly that.
>
>> I would be okay as well with restricting bare raise syntactically to
>> appearing only inside an except block, to emphasize the change in
>> semantics that was started when we decided to make the optional
>> variable disappear at the end of the except block.
>>
>> This would render the following code illegal:
>>
>> def f():
>>   try: 1/0
>>   except: pass
>>   raise
>
> Please note as well that:
>
> def f():
>  try: 1/0
>  except: pass
>  return sys.exc_info()
>
> would return (None, None, None).
> Actually, it already does with the patch I proposed for #2507, and the test
> suite runs fine after fixing a problem in doctest.py.

I'm fine with that. Since in 3.0 sys.exc_info() returns nothing that
isn't accessible from the caught variable, the only reason to use it
is that it makes the exception available to functions *called* from
the except clause. (E.g. logging.exception() works this way.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Fri May 30 20:54:29 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 30 May 2008 12:54:29 -0600
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <loom.20080530T181056-760@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<aac2c7cb0805301040s78839bb9y5610f98464651f25@mail.gmail.com>
	<loom.20080530T181056-760@post.gmane.org>
Message-ID: <aac2c7cb0805301154n246c85cagef2844362c8ab943@mail.gmail.com>

On Fri, May 30, 2008 at 12:16 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>> I'd like if a bare "raise" became purely lexical (as Guido just
>> suggested), ditching all the magic.
>>
>> However, things such as pdb.pm() still need access to the last
>> exception.  Maybe we can pare it down the bare minimum, a per-thread
>> last_exception?  That'd quickly get clobbered (we should intentionally
>> clear when leaving an except block),
>
> Well, the plan is to keep storing the current exception state in the
> thread state structure, so sys.exc_info() would still work fine until we
> leave the exception block.

Just to be clear, you'll remove PyFrameObject's
f_exc_{type,value,traceback}, and rely exclusively on sys.exc_info(),
right?


-- 
Adam Olsen, aka Rhamphoryncus

From solipsis at pitrou.net  Fri May 30 21:02:43 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Fri, 30 May 2008 19:02:43 +0000 (UTC)
Subject: [Python-3000] Exception re-raising woes
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<aac2c7cb0805301040s78839bb9y5610f98464651f25@mail.gmail.com>
	<loom.20080530T181056-760@post.gmane.org>
	<aac2c7cb0805301154n246c85cagef2844362c8ab943@mail.gmail.com>
Message-ID: <loom.20080530T185943-251@post.gmane.org>

Adam Olsen <rhamph <at> gmail.com> writes:
> 
> Just to be clear, you'll remove PyFrameObject's
> f_exc_{type,value,traceback},

Yes.

> and rely exclusively on sys.exc_info(),
> right?

More exactly, tstate->exc_* will continue storing the current state, and 
sys.exc_info() will continue relying on these values.

regards

Antoine.


From g.brandl at gmx.net  Fri May 30 21:08:33 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Fri, 30 May 2008 21:08:33 +0200
Subject: [Python-3000] Mac module removal complete?
Message-ID: <g1pjbl$p1o$1@ger.gmane.org>

Hi,

there still is a plat-mac directory in Lib (though it's empty), and several
places in the tree refer to it.  Also, quite a few libs/scripts/tools in the
Mac subdir refer to modules that were removed in Python 3.

Some Mac head will need to do some additional cleanup before final release
(I'd do it, but as a non-Mac-user I can't judge well enough what is important).

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From guido at python.org  Fri May 30 21:09:25 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 30 May 2008 12:09:25 -0700
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <483F4E3A.9090403@trueblade.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
	<483EAF95.5050503@trueblade.com>
	<3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>
	<483F06FF.9090007@trueblade.com>
	<78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com>
	<483F4E3A.9090403@trueblade.com>
Message-ID: <ca471dc20805301209j67c3e2a5nf819297680cbf3fb@mail.gmail.com>

I'd be fine with adding '#' back to the formatting language for hex and oct.

On Thu, May 29, 2008 at 5:45 PM, Eric Smith
<eric+python-dev at trueblade.com> wrote:
> wesley chun wrote:
>>
>> On 5/29/08, Eric Smith <eric+python-dev at trueblade.com> wrote:
>>>
>>> Marcin 'Qrczak' Kowalczyk wrote:
>>>>
>>>> Except that it works incorrectly for negative numbers.
>>
>> wow, that is a great point.  i didn't think of this either. it makes
>> it very inconvenient (see below) and makes it more difficult to say
>> we've completed replaced the '%' operator.
>>
>>
>>>  I can't really think of any other way that doesn't involve converting
>>> the
>>> number to a string and then operating on that, just to get the sign.
>>
>> here's one way of doing it without converting to a string first (it's ugly
>> too):
>>
>>>>> i = -45
>>>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i))
>>
>> '-0x2d'
>
> Agreed, ick!
>
>> thx for putting it (back) in,
>
> I didn't say I would, I said I would if a decision was reached :)  I'd like
> to see some more consensus, and I hope that Talin (the PEP author) chimes
> in.
>
> Eric.
>
> _______________________________________________
> Python-3000 mailing list
> Python-3000 at python.org
> http://mail.python.org/mailman/listinfo/python-3000
> Unsubscribe:
> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From musiccomposition at gmail.com  Fri May 30 21:22:33 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Fri, 30 May 2008 14:22:33 -0500
Subject: [Python-3000] Mac module removal complete?
In-Reply-To: <g1pjbl$p1o$1@ger.gmane.org>
References: <g1pjbl$p1o$1@ger.gmane.org>
Message-ID: <1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com>

On Fri, May 30, 2008 at 2:08 PM, Georg Brandl <g.brandl at gmx.net> wrote:
> Hi,
>
> there still is a plat-mac directory in Lib (though it's empty), and several
> places in the tree refer to it.  Also, quite a few libs/scripts/tools in the
> Mac subdir refer to modules that were removed in Python 3.

I'm pretty sure that plat-mac is going to go, but can Brett confirm?
>
> Some Mac head will need to do some additional cleanup before final release
> (I'd do it, but as a non-Mac-user I can't judge well enough what is
> important).

I can handle that.


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From brett at python.org  Fri May 30 21:36:38 2008
From: brett at python.org (Brett Cannon)
Date: Fri, 30 May 2008 12:36:38 -0700
Subject: [Python-3000] Mac module removal complete?
In-Reply-To: <1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com>
References: <g1pjbl$p1o$1@ger.gmane.org>
	<1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com>
Message-ID: <bbaeab100805301236r30aab06cuf5da3cc10456a217@mail.gmail.com>

On Fri, May 30, 2008 at 12:22 PM, Benjamin Peterson
<musiccomposition at gmail.com> wrote:
> On Fri, May 30, 2008 at 2:08 PM, Georg Brandl <g.brandl at gmx.net> wrote:
>> Hi,
>>
>> there still is a plat-mac directory in Lib (though it's empty), and several
>> places in the tree refer to it.  Also, quite a few libs/scripts/tools in the
>> Mac subdir refer to modules that were removed in Python 3.
>
> I'm pretty sure that plat-mac is going to go, but can Brett confirm?

Ditch it. It's empty so there is no need to keep it. I am not even
sure how useful the Mac directory is at this point (but I have not
looked).

-Brett

From guido at python.org  Fri May 30 21:37:00 2008
From: guido at python.org (Guido van Rossum)
Date: Fri, 30 May 2008 12:37:00 -0700
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <48405420.8010800@trueblade.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>
	<483EAF95.5050503@trueblade.com>
	<3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>
	<483F06FF.9090007@trueblade.com>
	<78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com>
	<483F4E3A.9090403@trueblade.com>
	<ca471dc20805301209j67c3e2a5nf819297680cbf3fb@mail.gmail.com>
	<48405420.8010800@trueblade.com>
Message-ID: <ca471dc20805301237p4a30dd5du4f3ecb2771a967ca@mail.gmail.com>

Of course.

On Fri, May 30, 2008 at 12:23 PM, Eric Smith
<eric+python-dev at trueblade.com> wrote:
> Guido van Rossum wrote:
>>
>> I'd be fine with adding '#' back to the formatting language for hex and
>> oct.
>
> And bin, I assume?
>
>>
>> On Thu, May 29, 2008 at 5:45 PM, Eric Smith
>> <eric+python-dev at trueblade.com> wrote:
>>>
>>> wesley chun wrote:
>>>>
>>>> On 5/29/08, Eric Smith <eric+python-dev at trueblade.com> wrote:
>>>>>
>>>>> Marcin 'Qrczak' Kowalczyk wrote:
>>>>>>
>>>>>> Except that it works incorrectly for negative numbers.
>>>>
>>>> wow, that is a great point.  i didn't think of this either. it makes
>>>> it very inconvenient (see below) and makes it more difficult to say
>>>> we've completed replaced the '%' operator.
>>>>
>>>>
>>>>>  I can't really think of any other way that doesn't involve converting
>>>>> the
>>>>> number to a string and then operating on that, just to get the sign.
>>>>
>>>> here's one way of doing it without converting to a string first (it's
>>>> ugly
>>>> too):
>>>>
>>>>>>> i = -45
>>>>>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i))
>>>>
>>>> '-0x2d'
>>>
>>> Agreed, ick!
>>>
>>>> thx for putting it (back) in,
>>>
>>> I didn't say I would, I said I would if a decision was reached :)  I'd
>>> like
>>> to see some more consensus, and I hope that Talin (the PEP author) chimes
>>> in.
>>>
>>> Eric.
>>>
>>> _______________________________________________
>>> Python-3000 mailing list
>>> Python-3000 at python.org
>>> http://mail.python.org/mailman/listinfo/python-3000
>>> Unsubscribe:
>>> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>>
>>
>>
>>
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From eric+python-dev at trueblade.com  Fri May 30 21:23:12 2008
From: eric+python-dev at trueblade.com (Eric Smith)
Date: Fri, 30 May 2008 15:23:12 -0400
Subject: [Python-3000] PEP 3101 str.format() equivalent of '%#o/x/X'?
In-Reply-To: <ca471dc20805301209j67c3e2a5nf819297680cbf3fb@mail.gmail.com>
References: <78b3a9580805290156x6790a6c4k5c9ae49e79fa21d4@mail.gmail.com>	
	<483EAF95.5050503@trueblade.com>	
	<3f4107910805291116w37af38e3qb2c5145539d7e23e@mail.gmail.com>	
	<483F06FF.9090007@trueblade.com>	
	<78b3a9580805291634t3ed65ab3x29bedc80be887f8@mail.gmail.com>	
	<483F4E3A.9090403@trueblade.com>
	<ca471dc20805301209j67c3e2a5nf819297680cbf3fb@mail.gmail.com>
Message-ID: <48405420.8010800@trueblade.com>

Guido van Rossum wrote:
> I'd be fine with adding '#' back to the formatting language for hex and oct.

And bin, I assume?

> 
> On Thu, May 29, 2008 at 5:45 PM, Eric Smith
> <eric+python-dev at trueblade.com> wrote:
>> wesley chun wrote:
>>> On 5/29/08, Eric Smith <eric+python-dev at trueblade.com> wrote:
>>>> Marcin 'Qrczak' Kowalczyk wrote:
>>>>> Except that it works incorrectly for negative numbers.
>>> wow, that is a great point.  i didn't think of this either. it makes
>>> it very inconvenient (see below) and makes it more difficult to say
>>> we've completed replaced the '%' operator.
>>>
>>>
>>>>  I can't really think of any other way that doesn't involve converting
>>>> the
>>>> number to a string and then operating on that, just to get the sign.
>>> here's one way of doing it without converting to a string first (it's ugly
>>> too):
>>>
>>>>>> i = -45
>>>>>> '{0}0x{1:x}'.format('-' if i < 0 else '', abs(i))
>>> '-0x2d'
>> Agreed, ick!
>>
>>> thx for putting it (back) in,
>> I didn't say I would, I said I would if a decision was reached :)  I'd like
>> to see some more consensus, and I hope that Talin (the PEP author) chimes
>> in.
>>
>> Eric.
>>
>> _______________________________________________
>> Python-3000 mailing list
>> Python-3000 at python.org
>> http://mail.python.org/mailman/listinfo/python-3000
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-3000/guido%40python.org
>>
> 
> 
> 


From musiccomposition at gmail.com  Sat May 31 02:22:37 2008
From: musiccomposition at gmail.com (Benjamin Peterson)
Date: Fri, 30 May 2008 19:22:37 -0500
Subject: [Python-3000] Mac module removal complete?
In-Reply-To: <bbaeab100805301236r30aab06cuf5da3cc10456a217@mail.gmail.com>
References: <g1pjbl$p1o$1@ger.gmane.org>
	<1afaf6160805301222p2c8335e7h51ccec1a4110e563@mail.gmail.com>
	<bbaeab100805301236r30aab06cuf5da3cc10456a217@mail.gmail.com>
Message-ID: <1afaf6160805301722m1ad4d33fp264907c84e1faf9c@mail.gmail.com>

On Fri, May 30, 2008 at 2:36 PM, Brett Cannon <brett at python.org> wrote:
> On Fri, May 30, 2008 at 12:22 PM, Benjamin Peterson
>>
>> I'm pretty sure that plat-mac is going to go, but can Brett confirm?
>
> Ditch it. It's empty so there is no need to keep it. I am not even
> sure how useful the Mac directory is at this point (but I have not
> looked).

I did remove references to plat-mac in the Makefile/configure.
However, lib-darwin is auto-generated on install. I'm not really sure
if this is still needed.


-- 
Cheers,
Benjamin Peterson
"There's no place like 127.0.0.1."

From solipsis at pitrou.net  Sat May 31 03:33:14 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 31 May 2008 01:33:14 +0000 (UTC)
Subject: [Python-3000] Exception re-raising woes
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
Message-ID: <loom.20080531T012237-406@post.gmane.org>

Guido van Rossum <guido <at> python.org> writes:
> I would be okay as well with restricting bare raise syntactically to
> appearing only inside an except block, to emphasize the change in
> semantics that was started when we decided to make the optional
> variable disappear at the end of the except block.
> 
> This would render the following code illegal:
> 
> def f():
>   try: 1/0
>   except: pass
>   raise

But you may want to use bare raise in a function called from an exception
handler, e.g.:

def handle_exception():
    if user() == "Albert":
        # Albert likes his exceptions uncooked
        raise
    else:
        logging.exception("an exception occurred")

def f():
    try:
        raise KeyError
    except:
        handle_exception()


Antoine.


From rhamph at gmail.com  Sat May 31 03:44:22 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Fri, 30 May 2008 19:44:22 -0600
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <loom.20080531T012237-406@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
	<loom.20080531T012237-406@post.gmane.org>
Message-ID: <aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>

On Fri, May 30, 2008 at 7:33 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Guido van Rossum <guido <at> python.org> writes:
>> I would be okay as well with restricting bare raise syntactically to
>> appearing only inside an except block, to emphasize the change in
>> semantics that was started when we decided to make the optional
>> variable disappear at the end of the except block.
>>
>> This would render the following code illegal:
>>
>> def f():
>>   try: 1/0
>>   except: pass
>>   raise
>
> But you may want to use bare raise in a function called from an exception
> handler, e.g.:
>
> def handle_exception():
>    if user() == "Albert":
>        # Albert likes his exceptions uncooked
>        raise
>    else:
>        logging.exception("an exception occurred")
>
> def f():
>    try:
>        raise KeyError
>    except:
>        handle_exception()

This can be rewritten to use sys.exc_info(), ie:

def handle_exception():
    if user() == "Albert":
        # Albert likes his exceptions uncooked
        raise sys.exc_info()[1]
    else:
        logging.exception("an exception occurred")

-- 
Adam Olsen, aka Rhamphoryncus

From mhammond at skippinet.com.au  Sat May 31 08:43:43 2008
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Sat, 31 May 2008 16:43:43 +1000
Subject: [Python-3000] Exception re-raising woes
In-Reply-To: <aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>
References: <loom.20080526T091655-961@post.gmane.org>	<loom.20080530T094609-317@post.gmane.org>	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>	<loom.20080531T012237-406@post.gmane.org>
	<aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>
Message-ID: <003a01c8c2e9$afc2d910$0f488b30$@com.au>

> >> This would render the following code illegal:
> >>
> >> def f():
> >>   try: 1/0
> >>   except: pass
> >>   raise
> >
> > But you may want to use bare raise in a function called from an
> exception
> > handler, e.g.:
> >
> > def handle_exception():
> >    if user() == "Albert":
> >        # Albert likes his exceptions uncooked
> >        raise
> >    else:
> >        logging.exception("an exception occurred")
> >
> > def f():
> >    try:
> >        raise KeyError
> >    except:
> >        handle_exception()
> 
> This can be rewritten to use sys.exc_info(), ie:
> 
> def handle_exception():
>     if user() == "Albert":
>         # Albert likes his exceptions uncooked
>         raise sys.exc_info()[1]
>     else:
>         logging.exception("an exception occurred")

In both Python 2.x and 3 (a few months old build of Py3k though), the
traceback isn't the same.  For Python 2.0 you could write it like:

def handle_exception():
...
    raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2]

Its not clear how that would be spelt in py3k though (and from what I can
see, sys.exc_info() itself has an uncertain future in py3k).

Cheers,

Mark


From stefan_ml at behnel.de  Sat May 31 15:36:58 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 31 May 2008 15:36:58 +0200
Subject: [Python-3000] doctest portability
Message-ID: <g1rk9q$lti$1@ger.gmane.org>

Hi,

I currently use a bunch of work-arounds for doctests in lxml's test suite to
make them work in Py3. I converted most tests to a mix of Py2 and Py3 syntax
(e.g. using both u'' and b'' literals), and most of the runtime work is done
using regular expressions that convert the except-as syntax, strip package
names from tracebacks and translate bytes/str output between Py2 and Py3
syntax/repr.

I know, I could use the lib2to3 package, but it a) is a one-way tool in the
wrong direction if you have to distinguish bytes/str literals, b) lacks
configurability stating exactly what changes need to be done and c) seemed
harder to set up for doctests than doing the conversion by hand. It would be
really nice if the doctest module had a simple option that specified if the
doctests of a test suite are in Py2 or Py3 syntax, and then just did the right
thing under Py3 (and maybe also 2.6). Otherwise, a lot more people than just
myself will have a hard time getting their test suites to run in Py3, which is
basically the only way to sanely migrate code.

Stefan


From regebro at gmail.com  Sat May 31 15:52:59 2008
From: regebro at gmail.com (Lennart Regebro)
Date: Sat, 31 May 2008 15:52:59 +0200
Subject: [Python-3000] Proposal to add __str__ method to iterables.
In-Reply-To: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com>
References: <7BA0A945-F162-4B00-B0F2-1C8AB496834C@carlsensei.com>
Message-ID: <319e029f0805310652k5c6b4fa6x4e7f597ebc7b344c@mail.gmail.com>

On Wed, May 28, 2008 at 5:48 AM, Carl Johnson <carl at carlsensei.com> wrote:
> Proposed behavior of the __str__ method for iterables is that it returns the
> result of "".join(str(i) for i in self).

In 8-9 years of python programming I have probably never needed to do
"".join(str(i) for i in self), so even if there was a __str__ on
iterables, this seems to me to be a particularily useless default. :)

> In order to replicate the behavior of filter with a comprehension

Instead of running filter on a string, you can replace the offending
character with emptyness.

filter(lambda c: c!="a", "abracadbra")
'brcdbr'

"abracadbra".replace('c', '')
'brcdbr'

I find using filter on a string kinda strange, I have to say. How
often do you have to filter away certain characters in s a string?
Never happened to me.

-- 
Lennart Regebro: Zope and Plone consulting.
http://www.colliberty.com/
+33 661 58 14 64

From solipsis at pitrou.net  Sat May 31 15:59:24 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 31 May 2008 13:59:24 +0000 (UTC)
Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?=
References: <loom.20080526T091655-961@post.gmane.org>	<loom.20080530T094609-317@post.gmane.org>	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>	<loom.20080531T012237-406@post.gmane.org>
	<aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>
	<003a01c8c2e9$afc2d910$0f488b30$@com.au>
Message-ID: <loom.20080531T131423-489@post.gmane.org>

Mark Hammond <mhammond <at> skippinet.com.au> writes:
> In both Python 2.x and 3 (a few months old build of Py3k though), the
> traceback isn't the same.  For Python 2.0 you could write it like:
> 
> def handle_exception():
> ...
>     raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2]
> 
> Its not clear how that would be spelt in py3k though (and from what I can
> see, sys.exc_info() itself has an uncertain future in py3k).

sys.exc_info() will remain, it's just that the returned value will be (None,
None, None) if we are not in an except block in any of the currently active
frames in the thread. In the case above it would return the current exception
(the one caught in one of the enclosing frames).

By the way, another interesting sys.exc_info() case:

def except_yield():
    try:
        raise TypeError
    except:
        yield 1

def f():
    for i in except_yield():
        return sys.exc_info()

Right now, running f() returns (None, None, None). But with rewritten exception
stacking, it may return the 3-tuple for the TypeError raised in except_yield().

Regards

Antoine.


From g.brandl at gmx.net  Sat May 31 17:13:38 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 31 May 2008 17:13:38 +0200
Subject: [Python-3000] doctest portability
In-Reply-To: <g1rk9q$lti$1@ger.gmane.org>
References: <g1rk9q$lti$1@ger.gmane.org>
Message-ID: <g1rpv7$5mp$1@ger.gmane.org>

Stefan Behnel schrieb:
> Hi,
> 
> I currently use a bunch of work-arounds for doctests in lxml's test suite to
> make them work in Py3. I converted most tests to a mix of Py2 and Py3 syntax
> (e.g. using both u'' and b'' literals), and most of the runtime work is done
> using regular expressions that convert the except-as syntax, strip package
> names from tracebacks and translate bytes/str output between Py2 and Py3
> syntax/repr.
> 
> I know, I could use the lib2to3 package, but it a) is a one-way tool in the
> wrong direction if you have to distinguish bytes/str literals, b) lacks
> configurability stating exactly what changes need to be done and c) seemed
> harder to set up for doctests than doing the conversion by hand.

Shouldn't the -d option handle doctests without further set-up?

Georg


-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From g.brandl at gmx.net  Sat May 31 17:33:38 2008
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 31 May 2008 17:33:38 +0200
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>	<g1ll0v$i0p$1@ger.gmane.org>
	<bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>
Message-ID: <g1rr4o$956$1@ger.gmane.org>

Brett Cannon schrieb:

>>> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then
>>> again, pydoc is busted thanks to the new doc format.
>>
>> I will try to handle this in the coming week.
>>
> 
> Fred had the interesting suggestion of removing pydoc in Py3K based on
> the thinking that documentation tools like pydoc should be external to
> Python. With the docs now so easy to generate directly, should pydoc
> perhaps just be gutted to only what is needed for help() to work?

pydoc is fine for displaying docstring help, and interactive help.
This should stay.

Of course, it would also be nice for ``help("if")`` to work effortlessly,
which it currently only does if the generated HTML documentation is
available somewhere, which it typically isn't -- on Unix most distributions
put it in a separate package (from which pydoc won't always find it
of its own), on Windows only the CHM file is distributed and must be
decompiled to get single HTML files.

Now that the docs are reST, the source is almost pretty enough to display
it raw, but I could also imagine a "text" writer that removes the more
obscure markup to present a casual-reader-friendly text version.

The needed sources could then be distributed with Python -- it shouldn't
be more than about 200 kb.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.


From steve at holdenweb.com  Sat May 31 17:42:24 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 31 May 2008 11:42:24 -0400
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <g1rr4o$956$1@ger.gmane.org>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>	<g1ll0v$i0p$1@ger.gmane.org>	<bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>
	<g1rr4o$956$1@ger.gmane.org>
Message-ID: <484171E0.4050204@holdenweb.com>

Georg Brandl wrote:
> Brett Cannon schrieb:
> 
>>>> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then
>>>> again, pydoc is busted thanks to the new doc format.
>>>
>>> I will try to handle this in the coming week.
>>>
>>
>> Fred had the interesting suggestion of removing pydoc in Py3K based on
>> the thinking that documentation tools like pydoc should be external to
>> Python. With the docs now so easy to generate directly, should pydoc
>> perhaps just be gutted to only what is needed for help() to work?
> 
> pydoc is fine for displaying docstring help, and interactive help.
> This should stay.
> 
> Of course, it would also be nice for ``help("if")`` to work effortlessly,
> which it currently only does if the generated HTML documentation is
> available somewhere, which it typically isn't -- on Unix most distributions
> put it in a separate package (from which pydoc won't always find it
> of its own), on Windows only the CHM file is distributed and must be
> decompiled to get single HTML files.
> 
> Now that the docs are reST, the source is almost pretty enough to display
> it raw, but I could also imagine a "text" writer that removes the more
> obscure markup to present a casual-reader-friendly text version.
> 
> The needed sources could then be distributed with Python -- it shouldn't
> be more than about 200 kb.

The versioned documentation will sometimes be available from the 
Internet if you want to think about using that as a fallback source. It 
*would* be nice if help("if") worked.

It would be even handier if help(if) worked, but that's a syntax 
problem, and it would be a horrendous one to overcome, I suspect.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/


From stefan_ml at behnel.de  Sat May 31 17:47:06 2008
From: stefan_ml at behnel.de (Stefan Behnel)
Date: Sat, 31 May 2008 17:47:06 +0200
Subject: [Python-3000] doctest portability
In-Reply-To: <g1rpv7$5mp$1@ger.gmane.org>
References: <g1rk9q$lti$1@ger.gmane.org> <g1rpv7$5mp$1@ger.gmane.org>
Message-ID: <g1rrtq$aoe$1@ger.gmane.org>

Georg Brandl wrote:
> Stefan Behnel schrieb:
>> I know, I could use the lib2to3 package, but it a) is a one-way tool
>> in the
>> wrong direction if you have to distinguish bytes/str literals, b) lacks
>> configurability stating exactly what changes need to be done and c)
>> seemed
>> harder to set up for doctests than doing the conversion by hand.
> 
> Shouldn't the -d option handle doctests without further set-up?

If you start 2to3 from the command prompt to convert the files that contain
the doctests and copy them to a new location, then yes. But the question is:
how do you run a Py2 doctest in Py3 without first copying your doctests or
doctest containing sources to new files and then running the tests from there?
You can't require people to put such a work-around into every test script in
the world. Adding an option, fine. Copying files, adapting paths and all that,
why?

Stefan


From guido at python.org  Sat May 31 18:32:13 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 31 May 2008 09:32:13 -0700
Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com>
	<fb6fbf560805271312g41c6b240h322c7142d1cf98be@mail.gmail.com>
	<797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com>
	<797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com>
Message-ID: <ca471dc20805310932q18f1ba44r317eac65f1209258@mail.gmail.com>

Hi Atsuo,

I'm very close to accepting your PEP. I have a few questions:

- The Rationale has a more elaborate (and perhaps slightly
conflicting, regarding the status of ASCII space?) definition of our
definition of non-printable than the Specification. Perhaps this could
be merged?

- I'm still not comfortable with making stdout default to
backslashreplace. Stuff written to stdout might be consumed by another
program that might misinterpret the \ escapes. Previously I thought I
was okay with doing this only if stdout.isatty() returns True, but I
think that would just add confusion of the kind "it works in
interactive mode but not when redirecting to a file".  I'm okay with
apps who think they need this setting that explicitly, but not to
having it be the default. (For stderr however I agree that
backslashreplace is the right default.)

- What happens to Unicode characters that are "unassigned"? I assume
there are many of those, especially outside the basic plane. Shouldn't
we be conservative and convert these to \u or \U escapes as well?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Sat May 31 19:24:32 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 31 May 2008 11:24:32 -0600
Subject: [Python-3000] sys.exc_info()
In-Reply-To: <loom.20080531T131423-489@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
	<loom.20080531T012237-406@post.gmane.org>
	<aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>
	<003a01c8c2e9$afc2d910$0f488b30$@com.au>
	<loom.20080531T131423-489@post.gmane.org>
Message-ID: <aac2c7cb0805311024mb699d8euf6a896909051dfb5@mail.gmail.com>

On Sat, May 31, 2008 at 7:59 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Mark Hammond <mhammond <at> skippinet.com.au> writes:
>> In both Python 2.x and 3 (a few months old build of Py3k though), the
>> traceback isn't the same.  For Python 2.0 you could write it like:
>>
>> def handle_exception():
>> ...
>>     raise sys.exc_info()[0], sys.exc_info()[1], sys.exc_info()[2]
>>
>> Its not clear how that would be spelt in py3k though (and from what I can
>> see, sys.exc_info() itself has an uncertain future in py3k).
>
> sys.exc_info() will remain, it's just that the returned value will be (None,
> None, None) if we are not in an except block in any of the currently active
> frames in the thread. In the case above it would return the current exception
> (the one caught in one of the enclosing frames).
>
> By the way, another interesting sys.exc_info() case:
>
> def except_yield():
>    try:
>        raise TypeError
>    except:
>        yield 1
>
> def f():
>    for i in except_yield():
>        return sys.exc_info()
>
> Right now, running f() returns (None, None, None). But with rewritten exception
> stacking, it may return the 3-tuple for the TypeError raised in except_yield().

What exception stacking?  I thought we'd be using a simple per-thread
exception.  I'd expect the yield statement to clear it, giving us
(None, None, None).


-- 
Adam Olsen, aka Rhamphoryncus

From ishimoto at gembook.org  Sat May 31 19:30:05 2008
From: ishimoto at gembook.org (Atsuo Ishimoto)
Date: Sun, 1 Jun 2008 02:30:05 +0900
Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <ca471dc20805310932q18f1ba44r317eac65f1209258@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com>
	<fb6fbf560805271312g41c6b240h322c7142d1cf98be@mail.gmail.com>
	<797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com>
	<797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com>
	<ca471dc20805310932q18f1ba44r317eac65f1209258@mail.gmail.com>
Message-ID: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com>

Hi Guido,

On Sun, Jun 1, 2008 at 1:32 AM, Guido van Rossum <guido at python.org> wrote:
> Hi Atsuo,
>
> I'm very close to accepting your PEP. I have a few questions:

Great!

>
> - The Rationale has a more elaborate (and perhaps slightly
> conflicting, regarding the status of ASCII space?) definition of our
> definition of non-printable than the Specification. Perhaps this could
> be merged?
>

Yes, I'll merge them.

> - I'm still not comfortable with making stdout default to
> backslashreplace. Stuff written to stdout might be consumed by another
> program that might misinterpret the \ escapes. Previously I thought I
> was okay with doing this only if stdout.isatty() returns True, but I
> think that would just add confusion of the kind "it works in
> interactive mode but not when redirecting to a file".  I'm okay with
> apps who think they need this setting that explicitly, but not to
> having it be the default. (For stderr however I agree that
> backslashreplace is the right default.)

Okay, we'll keep 'strict' as default error handler for stdout always,
then. I can live with it.
But, my $0.02, I expect this issue will be revisited after people
start to develop real applications with Python 3.x.

>
> - What happens to Unicode characters that are "unassigned"? I assume
> there are many of those, especially outside the basic plane. Shouldn't
> we be conservative and convert these to \u or \U escapes as well?
>

Unassigned characters are defined as 'Cn ' in the Unicode database and
they will be escaped.

I'll update the PEP and the patch on Sunday. Thank you!

From solipsis at pitrou.net  Sat May 31 19:41:46 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 31 May 2008 17:41:46 +0000 (UTC)
Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?=
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
	<loom.20080531T012237-406@post.gmane.org>
	<aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>
	<003a01c8c2e9$afc2d910$0f488b30$@com.au>
	<loom.20080531T131423-489@post.gmane.org>
	<aac2c7cb0805311024mb699d8euf6a896909051dfb5@mail.gmail.com>
Message-ID: <loom.20080531T173224-563@post.gmane.org>

Adam Olsen <rhamph <at> gmail.com> writes:
> > By the way, another interesting sys.exc_info() case:
> >
> > def except_yield():
> >    try:
> >        raise TypeError
> >    except:
> >        yield 1
> >
> > def f():
> >    for i in except_yield():
> >        return sys.exc_info()
> >
> > Right now, running f() returns (None, None, None). But with rewritten 
exception
> > stacking, it may return the 3-tuple for the TypeError raised in 
except_yield().
> 
> What exception stacking?  I thought we'd be using a simple per-thread
> exception.  I'd expect the yield statement to clear it, giving us
> (None, None, None).

There is a per-thread exception for the current exception state but we
must also save and restore the previous state when we enter and leave
an exception handler, respectively, so that re-raising and sys.exc_info()
work properly in situations with lexically nested exception handlers.

Also, "yield" cannot blindingly clear the exception state, because the frame 
calling the generator may except the exception state to be non-None.
Consequently, we might have to keep the f_exc_* members solely for the
generator case.


From rhamph at gmail.com  Sat May 31 21:35:36 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 31 May 2008 13:35:36 -0600
Subject: [Python-3000] sys.exc_info()
In-Reply-To: <loom.20080531T173224-563@post.gmane.org>
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
	<loom.20080531T012237-406@post.gmane.org>
	<aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>
	<003a01c8c2e9$afc2d910$0f488b30$@com.au>
	<loom.20080531T131423-489@post.gmane.org>
	<aac2c7cb0805311024mb699d8euf6a896909051dfb5@mail.gmail.com>
	<loom.20080531T173224-563@post.gmane.org>
Message-ID: <aac2c7cb0805311235x31ac0151p45fe58cd2e879c99@mail.gmail.com>

On Sat, May 31, 2008 at 11:41 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Adam Olsen <rhamph <at> gmail.com> writes:
>> > By the way, another interesting sys.exc_info() case:
>> >
>> > def except_yield():
>> >    try:
>> >        raise TypeError
>> >    except:
>> >        yield 1
>> >
>> > def f():
>> >    for i in except_yield():
>> >        return sys.exc_info()
>> >
>> > Right now, running f() returns (None, None, None). But with rewritten
> exception
>> > stacking, it may return the 3-tuple for the TypeError raised in
> except_yield().
>>
>> What exception stacking?  I thought we'd be using a simple per-thread
>> exception.  I'd expect the yield statement to clear it, giving us
>> (None, None, None).
>
> There is a per-thread exception for the current exception state but we
> must also save and restore the previous state when we enter and leave
> an exception handler, respectively, so that re-raising and sys.exc_info()
> work properly in situations with lexically nested exception handlers.

The bytecode generation for "raise" could be changed literally be the
same as "except Foo as e: raise e".  Reuse our existing stack, not add
another one.

sys.exc_info() won't get clobbered until another exception gets
raised.  I see no reason why this needs to return anything other than
(None, None, None):

def x():
    try:
        ...
    except:
        try:
            ...
        except:
            pass
        #raise
        return sys.exc_info()

The commented out raise should use the outer except block (and thus be
lexically based), but sys.exc_info() doesn't have to be.  If you want
it to work, use it *immediately* at the start of the block.


> Also, "yield" cannot blindingly clear the exception state, because the frame
> calling the generator may except the exception state to be non-None.
> Consequently, we might have to keep the f_exc_* members solely for the
> generator case.

Why?  Why should the frame calling the generator be inspecting the
exception state of the generator?  What's the use case?


-- 
Adam Olsen, aka Rhamphoryncus

From solipsis at pitrou.net  Sat May 31 22:03:49 2008
From: solipsis at pitrou.net (Antoine Pitrou)
Date: Sat, 31 May 2008 20:03:49 +0000 (UTC)
Subject: [Python-3000] =?utf-8?b?c3lzLmV4Y19pbmZvKCk=?=
References: <loom.20080526T091655-961@post.gmane.org>
	<loom.20080530T094609-317@post.gmane.org>
	<ca471dc20805301030q26271bfcx907250ddd3f9875a@mail.gmail.com>
	<loom.20080531T012237-406@post.gmane.org>
	<aac2c7cb0805301844o50b80f6cpbcef72bcdfeceecc@mail.gmail.com>
	<003a01c8c2e9$afc2d910$0f488b30$@com.au>
	<loom.20080531T131423-489@post.gmane.org>
	<aac2c7cb0805311024mb699d8euf6a896909051dfb5@mail.gmail.com>
	<loom.20080531T173224-563@post.gmane.org>
	<aac2c7cb0805311235x31ac0151p45fe58cd2e879c99@mail.gmail.com>
Message-ID: <loom.20080531T195123-123@post.gmane.org>

Adam Olsen <rhamph <at> gmail.com> writes:
> 
> The bytecode generation for "raise" could be changed literally be the
> same as "except Foo as e: raise e".  Reuse our existing stack, not add
> another one.

As someone else pointed, there is a difference between the two constructs: the
latter appends a line to the traceback while the former doesn't. I suppose in
some contexts it can be useful (especially if the exception is re-raised several
times because of a complex architecture, e.g. a framework).

> The commented out raise should use the outer except block (and thus be
> lexically based), but sys.exc_info() doesn't have to be.

But would you object to sys.exc_info() being lexically based as well?
I say that because the bare "raise" statement and sys.exc_info() use the same
attributes internally, so they will have the same semantics unless we decide
it's better to do otherwise.

> > Also, "yield" cannot blindingly clear the exception state, because the frame
> > calling the generator may except the exception state to be non-None.
> > Consequently, we might have to keep the f_exc_* members solely for the
> > generator case.
> 
> Why?  Why should the frame calling the generator be inspecting the
> exception state of the generator?  What's the use case?

You misunderstood me. The f_exc_* fields will be used internally to swap between
the inner generator's exception state and the calling frame's own exception
state. They will have no useful meaning for outside code so I suggest they are
not accessible from Python code anymore.

Regards

Antoine.


From guido at python.org  Sat May 31 22:09:49 2008
From: guido at python.org (Guido van Rossum)
Date: Sat, 31 May 2008 13:09:49 -0700
Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com>
	<fb6fbf560805271312g41c6b240h322c7142d1cf98be@mail.gmail.com>
	<797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com>
	<797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com>
	<ca471dc20805310932q18f1ba44r317eac65f1209258@mail.gmail.com>
	<797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com>
Message-ID: <ca471dc20805311309x26dd3d87k42b891193b2dbdf7@mail.gmail.com>

Great -- get ready to make your patch perfect!

On Sat, May 31, 2008 at 10:30 AM, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
> Hi Guido,
>
> On Sun, Jun 1, 2008 at 1:32 AM, Guido van Rossum <guido at python.org> wrote:
>> Hi Atsuo,
>>
>> I'm very close to accepting your PEP. I have a few questions:
>
> Great!
>
>>
>> - The Rationale has a more elaborate (and perhaps slightly
>> conflicting, regarding the status of ASCII space?) definition of our
>> definition of non-printable than the Specification. Perhaps this could
>> be merged?
>>
>
> Yes, I'll merge them.
>
>> - I'm still not comfortable with making stdout default to
>> backslashreplace. Stuff written to stdout might be consumed by another
>> program that might misinterpret the \ escapes. Previously I thought I
>> was okay with doing this only if stdout.isatty() returns True, but I
>> think that would just add confusion of the kind "it works in
>> interactive mode but not when redirecting to a file".  I'm okay with
>> apps who think they need this setting that explicitly, but not to
>> having it be the default. (For stderr however I agree that
>> backslashreplace is the right default.)
>
> Okay, we'll keep 'strict' as default error handler for stdout always,
> then. I can live with it.
> But, my $0.02, I expect this issue will be revisited after people
> start to develop real applications with Python 3.x.
>
>>
>> - What happens to Unicode characters that are "unassigned"? I assume
>> there are many of those, especially outside the basic plane. Shouldn't
>> we be conservative and convert these to \u or \U escapes as well?
>>
>
> Unassigned characters are defined as 'Cn ' in the Unicode database and
> they will be escaped.
>
> I'll update the PEP and the patch on Sunday. Thank you!
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From rhamph at gmail.com  Sat May 31 23:33:05 2008
From: rhamph at gmail.com (Adam Olsen)
Date: Sat, 31 May 2008 15:33:05 -0600
Subject: [Python-3000] Fwd: UPDATED: PEP 3138- String representation in
	Python 3000
In-Reply-To: <797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com>
References: <797440730805240349t2d5430fxdf77c24eb7541204@mail.gmail.com>
	<fb6fbf560805261806j7b79e776i3361a655ac1d8a36@mail.gmail.com>
	<797440730805262311j6ece045apd731b4e78eacd4c5@mail.gmail.com>
	<fb6fbf560805271312g41c6b240h322c7142d1cf98be@mail.gmail.com>
	<797440730805272137s716d6b61l57d900b95364efd6@mail.gmail.com>
	<797440730805282340n1eea6597qfcabde6e1e611a0@mail.gmail.com>
	<ca471dc20805310932q18f1ba44r317eac65f1209258@mail.gmail.com>
	<797440730805311030q1fa2763bsc439f9183abcf3b5@mail.gmail.com>
Message-ID: <aac2c7cb0805311433q58a3e17ajd17e8debba10ca28@mail.gmail.com>

On Sat, May 31, 2008 at 11:30 AM, Atsuo Ishimoto <ishimoto at gembook.org> wrote:
> On Sun, Jun 1, 2008 at 1:32 AM, Guido van Rossum <guido at python.org> wrote:
>> - I'm still not comfortable with making stdout default to
>> backslashreplace. Stuff written to stdout might be consumed by another
>> program that might misinterpret the \ escapes. Previously I thought I
>> was okay with doing this only if stdout.isatty() returns True, but I
>> think that would just add confusion of the kind "it works in
>> interactive mode but not when redirecting to a file".  I'm okay with
>> apps who think they need this setting that explicitly, but not to
>> having it be the default. (For stderr however I agree that
>> backslashreplace is the right default.)
>
> Okay, we'll keep 'strict' as default error handler for stdout always,
> then. I can live with it.
> But, my $0.02, I expect this issue will be revisited after people
> start to develop real applications with Python 3.x.

I think the reason why strict/backslashreplace (respectively) work
well is that you can print a unicode string to stdout, have it fail
(encoding can't handle it), then get an exception printed to stderr
with the string escaped.

Making stderr stricter would make it unable to print the string and
making stdout less strict would let the error pass silently (printing
potential garbage instead).

-- 
Adam Olsen, aka Rhamphoryncus

From dickinsm at gmail.com  Sat May 31 23:52:10 2008
From: dickinsm at gmail.com (Mark Dickinson)
Date: Sat, 31 May 2008 17:52:10 -0400
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <g1rr4o$956$1@ger.gmane.org>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>
	<g1ll0v$i0p$1@ger.gmane.org>
	<bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>
	<g1rr4o$956$1@ger.gmane.org>
Message-ID: <5c6f2a5d0805311452v15a87d06g899fa8182dbd9d2a@mail.gmail.com>

On Sat, May 31, 2008 at 11:33 AM, Georg Brandl <g.brandl at gmx.net> wrote:

>
> Now that the docs are reST, the source is almost pretty enough to display
> it raw, but I could also imagine a "text" writer that removes the more
> obscure markup to present a casual-reader-friendly text version.
>
> The needed sources could then be distributed with Python -- it shouldn't
> be more than about 200 kb.
>

+1 from me.  Would this mean that htmllib and sgmllib could be
removed without further ado.

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-3000/attachments/20080531/76d0a109/attachment.htm>

From andymac at bullseye.apana.org.au  Mon May 26 15:10:43 2008
From: andymac at bullseye.apana.org.au (Andrew MacIntyre)
Date: Tue, 27 May 2008 00:10:43 +1100
Subject: [Python-3000] [Python-Dev] Stabilizing the C API of 2.6 and 3.0
In-Reply-To: <48397ECC.9070805@cheimes.de>
References: <48397ECC.9070805@cheimes.de>
Message-ID: <483AB6D3.8010207@bullseye.andymac.org>

Christian Heimes wrote:

> The first set of betas of Python 2.6 and 3.0 is fast apace. I like to
> grab the final chance and clean up the C API of 2.6 and 3.0. I know, I
> know, I brought up the topic two times in the past. But this time I mean
> it for real! :]

On the subject of stabilising the API, I assigned issue 2862 to you
concerning tidying up freelist management interfaces for ints and floats
(http://bugs.python.org/issue2862).

Note that the patch in issue 2862 is essentially orthogonal to the patch
in issue 2039, although any int/float freelist implementation changes
would require amendments.

Additionally, I notice that not all of the types with free lists have
grown routines to clear them - dicts, lists and sets are missing these
routines.  I will add a patch for these in the next few days if no-one
else gets there first.

On the subject of issue 2039, I've come to the view that "explicit is
better than implicit" applies to the freelist management, and with the
addition of freelist clearing routines called from gc.collect() I see
little reason to pursue bounding of freelist sizes (and would suggest
removal of existing bounding code in those freelist implementations that
currently have it).  I have also come to the view that pymalloc's
automatic attempts to return empty arenas to the OS should be changed to
an on-demand cleaning, called after all other cleanup in gc.collect().
Returning arenas, while not expensive in general, is nonetheless not free
(in performance terms).

-- 
-------------------------------------------------------------------------
Andrew I MacIntyre                     "These thoughts are mine alone..."
E-mail: andymac at bullseye.apana.org.au  (pref) | Snail: PO Box 370
        andymac at pcug.org.au             (alt) |        Belconnen ACT 2616
Web:    http://www.andymac.org/               |        Australia

From steve at holdenweb.com  Sat May 31 17:42:24 2008
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 31 May 2008 11:42:24 -0400
Subject: [Python-3000] [Python-Dev] Finishing up PEP 3108
In-Reply-To: <g1rr4o$956$1@ger.gmane.org>
References: <bbaeab100805282138x5ccac2d8q100fa0e969828bb4@mail.gmail.com>	<g1ll0v$i0p$1@ger.gmane.org>	<bbaeab100805291032v7749ba6n1752a0ef5be9cbca@mail.gmail.com>
	<g1rr4o$956$1@ger.gmane.org>
Message-ID: <484171E0.4050204@holdenweb.com>

Georg Brandl wrote:
> Brett Cannon schrieb:
> 
>>>> Issue 2873 - htmllib is slated to go, but pydoc still uses it. Then
>>>> again, pydoc is busted thanks to the new doc format.
>>>
>>> I will try to handle this in the coming week.
>>>
>>
>> Fred had the interesting suggestion of removing pydoc in Py3K based on
>> the thinking that documentation tools like pydoc should be external to
>> Python. With the docs now so easy to generate directly, should pydoc
>> perhaps just be gutted to only what is needed for help() to work?
> 
> pydoc is fine for displaying docstring help, and interactive help.
> This should stay.
> 
> Of course, it would also be nice for ``help("if")`` to work effortlessly,
> which it currently only does if the generated HTML documentation is
> available somewhere, which it typically isn't -- on Unix most distributions
> put it in a separate package (from which pydoc won't always find it
> of its own), on Windows only the CHM file is distributed and must be
> decompiled to get single HTML files.
> 
> Now that the docs are reST, the source is almost pretty enough to display
> it raw, but I could also imagine a "text" writer that removes the more
> obscure markup to present a casual-reader-friendly text version.
> 
> The needed sources could then be distributed with Python -- it shouldn't
> be more than about 200 kb.

The versioned documentation will sometimes be available from the 
Internet if you want to think about using that as a fallback source. It 
*would* be nice if help("if") worked.

It would be even handier if help(if) worked, but that's a syntax 
problem, and it would be a horrendous one to overcome, I suspect.

regards
  Steve
-- 
Steve Holden        +1 571 484 6266   +1 800 494 3119
Holden Web LLC              http://www.holdenweb.com/