From oliphant.travis at ieee.org  Wed Nov  1 00:03:01 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Tue, 31 Oct 2006 16:03:01 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <79990c6b0610311312y2a749b4bw617f0cf18ae9d660@mail.gmail.com>
References: <ehton4$t7n$1@sea.gmane.org>	<20061028135415.GA13049@code0.codespeak.net>	<ei6ak5$52s$1@sea.gmane.org>
	<4547007D.30404@v.loewis.de>	<ei7v3o$bit$1@sea.gmane.org>
	<45478C71.2010600@v.loewis.de>	<ei89aj$j34$1@sea.gmane.org>
	<79990c6b0610311312y2a749b4bw617f0cf18ae9d660@mail.gmail.com>
Message-ID: <ei8kn6$rti$1@sea.gmane.org>

Paul Moore wrote:
> On 10/31/06, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> 
>>Martin v. L?wis wrote:
>>
>>>[...] because I still don't quite understand what the PEP
>>>wants to achieve.
>>>
>>
>>Are you saying you still don't understand after having read the extended
>>buffer protocol PEP, yet?
> 
> 
> I can't speak for Martin, but I don't understand how I, as a Python
> programmer, might use the data type objects specified in the PEP. I
> have skimmed the extended buffer protocol PEP, but I'm conscious that
> no objects I currently use support the extended buffer protocol (and
> the PEP doesn't mention adding support to existing objects), so I
> don't see that as too relevant to me.

Do you use the PIL?  The PIL supports the array interface.

CVXOPT supports the array interface.

Numarray
Numeric
NumPy

all support the array interface.

> 
> I have also installed numpy, and looked at the help for numpy.dtype,
> but that doesn't add much to the PEP. 

The source-code is available.

> The freely available chapters of
> the numpy book explain how dtypes describe data structures, but not
> how to use them. 


The freely available Numeric documentation doesn't
> refer to dtypes, as far as I can tell. 

It kind of does, they are PyArray_Descr * structures in Numeric.  They 
just aren't Python objects.


Is there any documentation on
> how to use dtypes, independently of other features of numpy? 

There are examples and other help pages at http://www.scipy.org

If not,
> can you clarify where the benefit lies for a Python user of this
> proposal? (I understand the benefits of a common language for
> extensions to communicate datatype information, but why expose it to
> Python? How do Python users use it?)
> 

The only benefit I imagine would be for an extension module library 
writer and for users of the struct and array modules.  But, other than 
that, I don't know.  It actually doesn't have to be exposed to Python. 
I used Python notation in the PEP to explain what is basically a 
C-structure.  I don't care if the object ever gets exposed to Python.

Maybe that's part of the communication problem.


> This is probably all self-evident to the numpy community, but I think
> that as the PEP is aimed at a wider audience it needs a little more
> background.

It's hard to write that background because most of what I understand is 
from the NumPy community.  I can't give you all the examples but my 
concern is that you have all these third party libraries out there 
describing what is essentially binary data and using either 
string-copies or the buffer protocol + extra information obtained by 
some method or attribute that varies across the implementations.  There 
should really be a standard for describing this data.

There are attempts at it in the struct and array module.  There is the 
approach of ctypes but I claim that using Python type objects is 
over-kill for the purposes of describing data-formats.

-Travis


From pj at place.org  Wed Nov  1 00:05:57 2006
From: pj at place.org (Paul Jimenez)
Date: Tue, 31 Oct 2006 17:05:57 -0600
Subject: [Python-Dev] patch 1462525 or similar solution?
Message-ID: <20061031230557.656049036@place.org>


I submitted patch 1462525 awhile back to
solve the problem described even longer ago in
http://mail.python.org/pipermail/python-dev/2005-November/058301.html
and I'm wondering what my appropriate next steps are. Honestly, I don't
care if you take my patch or someone else's proposed solution, but I'd
like to see something go into the stdlib so that I can eventually stop
having to ship custom code for what is really a standard problem.

  --pj


From rrr at ronadam.com  Wed Nov  1 00:58:39 2006
From: rrr at ronadam.com (Ron Adam)
Date: Tue, 31 Oct 2006 17:58:39 -0600
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <ei8kn6$rti$1@sea.gmane.org>
References: <ehton4$t7n$1@sea.gmane.org>	<20061028135415.GA13049@code0.codespeak.net>	<ei6ak5$52s$1@sea.gmane.org>	<4547007D.30404@v.loewis.de>	<ei7v3o$bit$1@sea.gmane.org>	<45478C71.2010600@v.loewis.de>	<ei89aj$j34$1@sea.gmane.org>	<79990c6b0610311312y2a749b4bw617f0cf18ae9d660@mail.gmail.com>
	<ei8kn6$rti$1@sea.gmane.org>
Message-ID: <ei8o6e$5uu$1@sea.gmane.org>

> The only benefit I imagine would be for an extension module library 
> writer and for users of the struct and array modules.  But, other than 
> that, I don't know.  It actually doesn't have to be exposed to Python. 
> I used Python notation in the PEP to explain what is basically a 
> C-structure.  I don't care if the object ever gets exposed to Python.
> 
> Maybe that's part of the communication problem.


I get the impression where ctypes is good for accessing native C libraries from 
within python, the data-type object is meant to add a more direct way to share 
native python object's *data* with C (or other languages) in a more efficient 
way.  For data that can be represented well in continuous memory address's, it 
lightens the load so instead of a list of python objects you get an "array of 
data for n python_type objects" without the duplications of the python type for 
every element.

I think maybe some more complete examples demonstrating how it is to be used 
from both the Python and C would be good.

Cheers,
    Ron


From oliphant.travis at ieee.org  Wed Nov  1 01:13:37 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Tue, 31 Oct 2006 17:13:37 -0700
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
In-Reply-To: <4547BF86.6070806@v.loewis.de>
References: <ei6pde$7mn$1@sea.gmane.org> <4547BF86.6070806@v.loewis.de>
Message-ID: <ei8ors$7m4$1@sea.gmane.org>

Martin v. L?wis wrote:
> Travis E. Oliphant schrieb:
> 
>>    Several extensions to Python utilize the buffer protocol to share
>>    the location of a data-buffer that is really an N-dimensional
>>    array.  However, there is no standard way to exchange the
>>    additional N-dimensional array information so that the data-buffer
>>    is interpreted correctly.  The NumPy project introduced an array
>>    interface (http://numpy.scipy.org/array_interface.shtml) through a
>>    set of attributes on the object itself.  While this approach
>>    works, it requires attribute lookups which can be expensive when
>>    sharing many small arrays.  
> 
> 
> Can you please give examples for real-world applications of this
> interface, preferably examples involving multiple
> independently-developed libraries?
> ("this" being the current interface in NumPy - I understand that
>  the PEP's interface isn't implemented, yet)
> 

Examples of Need

     1) Suppose you have a image in *.jpg format that came from a
     camera and you want to apply Fourier-based image recovery to try
     and de-blur the image using modified Wiener filtering.  Then you
     want to save the result in *.png format.  The PIL provides an easy
     way to read *.jpg files into Python and write the result to *.png 

     and NumPy provides the FFT and the array math needed to implement
     the algorithm.  Rather than have to dig into the details of how
     NumPy and the PIL interpret chunks of memory in order to write a
     "converter" between NumPy arrays and PIL arrays, there should be
     support in the buffer protocol so that one could write
     something like:

     # Read the image
     a = numpy.frombuffer(Image.open('myimage.jpg')).

     # Process the image.
     A = numpy.fft.fft2(a)
     B = A*inv_filter
     b = numpy.fft.ifft2(B).real

     # Write it out
     Image.frombuffer(b).save('filtered.png')

     Currently, without this proposal you have to worry about the "mode"
     the image is in and get it's shape using a specific method call
     (this method call is different for every object you might want to
     interface with).

     2) The same argument for a library that reads and writes
     audio or video formats exists.

     3) You want to blit images onto a GUI Image buffer for rapid
     updates but need to do math processing on the image values
     themselves or you want to read the images from files supported by
     the PIL.

     If the PIL supported the extended buffer protocol, then you would
     not need to worry about the "mode" and the "shape" of the Image.

     What's more, you would also be able to accept images from any
     object (like NumPy arrays or ctypes arrays) that supported the
     extended buffer protcol without having to learn how it shares
     information like shape and data-format.


I could have also included examples from PyGame, OpenGL, etc.  I thought 
people were more aware of this argument as we've made it several times 
over the years.  It's just taken this long to get to a point to start 
asking for something to get into Python.


> Paul Moore (IIRC) gave the example of equalising the green values
> and maximizing the red values in a PIL image by passing it to NumPy:
> Is that a realistic (even though not-yet real-world) example? 

I think so, but I've never done something like that.

If
> so, what algorithms of NumPy would I use to perform this image
> manipulation (and why would I use NumPy for it if I could just
> write a for loop that does that in pure Python, given PIL's
> getpixel/setdata)?

Basically you would use array math operations and reductions (ufuncs and 
it's methods which are included in NumPy).  You would do it this way for 
speed.   It's going to be a lot slower doing those loops in Python. 
NumPy provides the ability to do them at close-to-C speeds.

-Travis


From tjreedy at udel.edu  Wed Nov  1 01:24:34 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 31 Oct 2006 19:24:34 -0500
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
References: <ei6pde$7mn$1@sea.gmane.org> <4547BF86.6070806@v.loewis.de>
Message-ID: <ei8pg0$9a5$1@sea.gmane.org>


""Martin v. L?wis"" <martin at v.loewis.de> wrote in message 
news:4547BF86.6070806 at v.loewis.de...
> Paul Moore (IIRC) gave the example of equalising the green values
> and maximizing the red values in a PIL image by passing it to NumPy:
> Is that a realistic (even though not-yet real-world) example? If
> so, what algorithms of NumPy would I use to perform this image
> manipulation

The use of surfarrays manipulated by Numeric has been an optional but 
important part of PyGame for years.
http://www.pygame.org/docs/
says
Surfarray Introduction
Pygame uses the Numeric python module to allow efficient per pixel effects 
on images. Using the surface arrays is an advanced feature that allows 
custom effects and filters. This also examines some of the simple effects 
from the Pygame example, arraydemo.py.
The Examples section of the linked page
http://www.pygame.org/docs/tut/surfarray/SurfarrayIntro.html
has code snippets for generating, resizing, recoloring, filtering, and 
cross-fading images.

>(and why would I use NumPy for it if I could just
> write a for loop that does that in pure Python, given PIL's
> getpixel/setdata)?

Why does anyone use Numeric/NumArray/NumPy?  Faster,easier coding and much 
faster execution, which is especially important when straining for an 
acceptible framerate.

----
I believe that at present PyGame can only work with external images that it 
is programmed to know how to import.  My guess is that if image source 
program X (such as PIL) described its data layout in a way that NumPy could 
read and act on, the import/copy step could be eliminated.  But perhaps 
Travis can clarify this.

Terry Jan Reedy


From wbaxter at gmail.com  Wed Nov  1 02:58:41 2006
From: wbaxter at gmail.com (Bill Baxter)
Date: Wed, 1 Nov 2006 01:58:41 +0000 (UTC)
Subject: [Python-Dev] PEP:  Adding data-type objects to Python
References: <ehton4$t7n$1@sea.gmane.org>
Message-ID: <loom.20061101T024911-480@post.gmane.org>

One thing I'm curious about in the ctypes vs this PEP debate is the following. 
How do the approaches differ in practice if I'm developing a library that wants
to accept various image formats that all describe the same thing: rgb data. 
Let's say for now all I want to support is two different image formats whose
pixels are described in C structs by:

struct rbg565
{
  unsigned short r:5;
  unsigned short g:6;
  unsigned short b:5; 
};

struct rgb101210
{
  unsigned int r:10;
  unsigned int g:12;
  unsigned int b:10; 
};


Basically in my code I want to be able to take the binary data descriptor and
say "give me the 'r' field of this pixel as an integer".

Is either one (the PEP or c-types) clearly easier to use in this case?  What
would the code look like for handling both formats generically?

--bb


From sluggoster at gmail.com  Wed Nov  1 04:14:13 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Tue, 31 Oct 2006 19:14:13 -0800
Subject: [Python-Dev] Path object design
Message-ID: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com>

I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying
that the first object-oriented proposal was rejected.  I'm in favor of
the "directory tuple" approach which wasn't mentioned in the thread.
This was proposed by Noal Raphael several months ago: a Path object
that's a sequence of components (a la os.path.split) rather than a
string.  The beauty of this approach is that slicing and joining are
expressed naturally using the [] and + operators, eliminating several
methods.

Introduction:  http://wiki.python.org/moin/AlternativePathClass
Feature discussion:  http://wiki.python.org/moin/AlternativePathDiscussion
Reference implementation:  http://wiki.python.org/moin/AlternativePathModule

(There's a link to the introduction at the end of PEP 355.)  Right now
I'm working on a test suite, then I want to add the features marked
"Mike" in the discussion -- in a way that people can compare the
feature alternatives in real code -- and write a PEP.  But it's a big
job for one person, and there are unresolved issues on the discussion
page, not to mention things brought up in the "PEP 355 status" thread.
 We had three people working on the discussion page but development
seems to have ground to a halt.

One thing is sure -- we urgently need something better than os.path.
It functions well but it makes hard-to-read and unpythonic code.  For
instance, I have an application that has to add its libraries to the
Python path, relative to the executable's location.

/toplevel
    app1/
        bin/
            main_progam.py
            utility1.py
            init_app.py
        lib/
            app_module.py
    shared/
        lib/
            shared_module.py

The solution I've found is an init_app module in every application
that sets up the paths.  Conceptually it needs "../lib" and
"../../shared/lib", but I want the absolute paths without hardcoding
them, in a platform-neutral way.  With os.path, "../lib" is:

    os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")

YUK!  Compare to PEP 355:

    Path(__FILE__).parent.parent.join("lib")

Much easier to read and debug.  Under Noam's proposal it would be:

    Path(__FILE__)[:-2] + "lib"

I'd also like to see the methods more intelligent: don't raise an
error if an operation is already done (e.g., a directory exists or a
file is already removed).  There's no reason to clutter one's code
with extra if's when the methods can easily encapsulate this. This was
considered a too radical departure from os.path for some, but I have
in mind even more radical convenience methods which I'd put in a
third-party subclass if they're not accepted into the standard
library, the way 'datetime' has third-party subclasses.

In my application I started using Orendorff's path module, expecting
the standard path object would be close to it.  When PEP 355 started
getting more changes and the directory-based alternative took off, I
took path.py out and rewrote my code for os.path until an alternative
becomes more stable. Now it looks like it will be several months and
possibly several third-party packages until one makes it into the
standard library. This is unfortunate.  Not only does it mean ugly
code in applications, but it means packages can't accept or return
Path objects and expect them to be compatible with other packages.

The reasons PEP 355 was rejected also sound strange.  Nick Coghlan
wrote (Oct 1):

> Things the PEP 355 path object lumps together:
>   - string manipulation operations
>   - abstract path manipulation operations (work for non-existent filesystems)
>   - read-only traversal of a concrete filesystem (dir, stat, glob, etc)
>   - addition & removal of files/directories/links within a concrete filesystem

> Dumping all of these into a single class is certainly practical from a utility
> point of view, but it's about as far away from beautiful as you can get, which
> creates problems from a learnability point of view, and from a
> capability-based security point of view.

What about the convenience of the users and the beauty of users' code?
 That's what matters to me.  And I consider one class *easier* to
learn.  I'm tired of memorizing that 'split' is in os.path while
'remove' and 'stat' are in os.  This seems arbitrary: you're statting
a path, aren't you?  Also, if you have four classes (abstract path,
file, directory, symlink), *each* of those will have 3+
platform-specific versions.  Then if you want to make an enhancement
subclass you'll have to make 12 of them, one for each of the 3*4
combinations of superclasses.  Encapsulation can help with this, but
it strays from the two-line convenience for the user:

    from path import Path
    p = Path("ABC")      # Works the same for files/directories on any platform.

Nevertheless, I'm open to seeing a multi-class API, though hopefully
less verbose than Talin's preliminary one (Oct 26).  Is it necessary
to support path.parent(), pathobj.parent(), io.dir.listdir(), *and*
io.dir.Directory().  That's four different namespaces to memorize
which function/method is where, and if a function/method belongs to
multiple ones it'll be duplicated, and you'll have to remember that
some methods are duplicated and others aren't...  Plus, constructors
like io.dir.Directory() look too verbose.  io.Directory() might be
acceptable, with the functions as class methods.

I agree that supporting non-filesystem directories (zip files,
CSV/Subversion sandboxes, URLs) would be nice, but we already have a
big enough project without that.  What constraints should a Path
object keep in mind in order to be forward-compatible with this?

If anyone has design ideas/concerns about a new Path class(es), please
post them.  If anyone would like to work on a directory-based
spec/implementation, please email me.

-- 
Mike Orr <sluggoster at gmail.com>

From talin at acm.org  Wed Nov  1 04:20:50 2006
From: talin at acm.org (Talin)
Date: Tue, 31 Oct 2006 19:20:50 -0800
Subject: [Python-Dev] Path object design
In-Reply-To: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com>
References: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com>
Message-ID: <45481292.7040403@acm.org>

I'm right in the middle of typing up a largish post to go on the 
Python-3000 mailing list about this issue. Maybe we should move it over 
there, since its likely that any path reform will have to be targeted at 
Py3K...?

Mike Orr wrote:
> I just saw the Path object thread ("PEP 355 status", Sept-Oct), saying
> that the first object-oriented proposal was rejected.  I'm in favor of
> the "directory tuple" approach which wasn't mentioned in the thread.
> This was proposed by Noal Raphael several months ago: a Path object
> that's a sequence of components (a la os.path.split) rather than a
> string.  The beauty of this approach is that slicing and joining are
> expressed naturally using the [] and + operators, eliminating several
> methods.
> 
> Introduction:  http://wiki.python.org/moin/AlternativePathClass
> Feature discussion:  http://wiki.python.org/moin/AlternativePathDiscussion
> Reference implementation:  http://wiki.python.org/moin/AlternativePathModule
> 
> (There's a link to the introduction at the end of PEP 355.)  Right now
> I'm working on a test suite, then I want to add the features marked
> "Mike" in the discussion -- in a way that people can compare the
> feature alternatives in real code -- and write a PEP.  But it's a big
> job for one person, and there are unresolved issues on the discussion
> page, not to mention things brought up in the "PEP 355 status" thread.
>  We had three people working on the discussion page but development
> seems to have ground to a halt.
> 
> One thing is sure -- we urgently need something better than os.path.
> It functions well but it makes hard-to-read and unpythonic code.  For
> instance, I have an application that has to add its libraries to the
> Python path, relative to the executable's location.
> 
> /toplevel
>     app1/
>         bin/
>             main_progam.py
>             utility1.py
>             init_app.py
>         lib/
>             app_module.py
>     shared/
>         lib/
>             shared_module.py
> 
> The solution I've found is an init_app module in every application
> that sets up the paths.  Conceptually it needs "../lib" and
> "../../shared/lib", but I want the absolute paths without hardcoding
> them, in a platform-neutral way.  With os.path, "../lib" is:
> 
>     os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")
> 
> YUK!  Compare to PEP 355:
> 
>     Path(__FILE__).parent.parent.join("lib")
> 
> Much easier to read and debug.  Under Noam's proposal it would be:
> 
>     Path(__FILE__)[:-2] + "lib"
> 
> I'd also like to see the methods more intelligent: don't raise an
> error if an operation is already done (e.g., a directory exists or a
> file is already removed).  There's no reason to clutter one's code
> with extra if's when the methods can easily encapsulate this. This was
> considered a too radical departure from os.path for some, but I have
> in mind even more radical convenience methods which I'd put in a
> third-party subclass if they're not accepted into the standard
> library, the way 'datetime' has third-party subclasses.
> 
> In my application I started using Orendorff's path module, expecting
> the standard path object would be close to it.  When PEP 355 started
> getting more changes and the directory-based alternative took off, I
> took path.py out and rewrote my code for os.path until an alternative
> becomes more stable. Now it looks like it will be several months and
> possibly several third-party packages until one makes it into the
> standard library. This is unfortunate.  Not only does it mean ugly
> code in applications, but it means packages can't accept or return
> Path objects and expect them to be compatible with other packages.
> 
> The reasons PEP 355 was rejected also sound strange.  Nick Coghlan
> wrote (Oct 1):
> 
>> Things the PEP 355 path object lumps together:
>>   - string manipulation operations
>>   - abstract path manipulation operations (work for non-existent filesystems)
>>   - read-only traversal of a concrete filesystem (dir, stat, glob, etc)
>>   - addition & removal of files/directories/links within a concrete filesystem
> 
>> Dumping all of these into a single class is certainly practical from a utility
>> point of view, but it's about as far away from beautiful as you can get, which
>> creates problems from a learnability point of view, and from a
>> capability-based security point of view.
> 
> What about the convenience of the users and the beauty of users' code?
>  That's what matters to me.  And I consider one class *easier* to
> learn.  I'm tired of memorizing that 'split' is in os.path while
> 'remove' and 'stat' are in os.  This seems arbitrary: you're statting
> a path, aren't you?  Also, if you have four classes (abstract path,
> file, directory, symlink), *each* of those will have 3+
> platform-specific versions.  Then if you want to make an enhancement
> subclass you'll have to make 12 of them, one for each of the 3*4
> combinations of superclasses.  Encapsulation can help with this, but
> it strays from the two-line convenience for the user:
> 
>     from path import Path
>     p = Path("ABC")      # Works the same for files/directories on any platform.
> 
> Nevertheless, I'm open to seeing a multi-class API, though hopefully
> less verbose than Talin's preliminary one (Oct 26).  Is it necessary
> to support path.parent(), pathobj.parent(), io.dir.listdir(), *and*
> io.dir.Directory().  That's four different namespaces to memorize
> which function/method is where, and if a function/method belongs to
> multiple ones it'll be duplicated, and you'll have to remember that
> some methods are duplicated and others aren't...  Plus, constructors
> like io.dir.Directory() look too verbose.  io.Directory() might be
> acceptable, with the functions as class methods.
> 
> I agree that supporting non-filesystem directories (zip files,
> CSV/Subversion sandboxes, URLs) would be nice, but we already have a
> big enough project without that.  What constraints should a Path
> object keep in mind in order to be forward-compatible with this?
> 
> If anyone has design ideas/concerns about a new Path class(es), please
> post them.  If anyone would like to work on a directory-based
> spec/implementation, please email me.
> 

From tjreedy at udel.edu  Wed Nov  1 05:01:27 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Tue, 31 Oct 2006 23:01:27 -0500
Subject: [Python-Dev] PEP: Extending the buffer protocol to share
	arrayinformation.
References: <ei6pde$7mn$1@sea.gmane.org> <4547BF86.6070806@v.loewis.de>
	<ei8ors$7m4$1@sea.gmane.org>
Message-ID: <ei966l$5fd$1@sea.gmane.org>


"Travis Oliphant" <oliphant.travis at ieee.org> wrote in message 
news:ei8ors$7m4$1 at sea.gmane.org...
>Examples of Need
[snip]
< I could have also included examples from PyGame, OpenGL, etc.  I thought
>people were more aware of this argument as we've made it several times
>over the years.  It's just taken this long to get to a point to start
>asking for something to get into Python.

The problem of data format definition and sharing of data between 
applications has been a bugaboo of computer science for decades.  But some 
have butted their heads against it more than others.

Something which made a noticeable dent in the problem, by making sharing 
'just work' more easily, would, to me, be a read plus for python.

tjr


From ronaldoussoren at mac.com  Wed Nov  1 07:53:27 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Wed, 1 Nov 2006 07:53:27 +0100
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <454789F9.7050808@ctypes.org>
References: <fb6fbf560610300956k4a1b50d2r72b44238bd4336f2@mail.gmail.com>
	<ei5rnm$pib$1@sea.gmane.org> <45468C8E.1000203@canterbury.ac.nz>
	<ei6aos$52s$2@sea.gmane.org> <454789F9.7050808@ctypes.org>
Message-ID: <C35C359D-81BB-4023-9738-46B47863E12C@mac.com>


On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote:

>
> This mechanism is probably a hack because it'n not possible to add  
> C accessible
> fields to type objects, on the other hand it is extensible (in  
> principle, at least).

I better start rewriting PyObjC then :-). PyObjC stores some addition  
information in the type objects that are used to describe Objective-C  
classes (such as a reference to the proxied class).

IIRC This has been possible from Python 2.3.

Ronald


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3562 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061101/660e4844/attachment.bin 

From fredrik at pythonware.com  Wed Nov  1 08:45:06 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 01 Nov 2006 08:45:06 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <45481292.7040403@acm.org>
References: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com>
	<45481292.7040403@acm.org>
Message-ID: <ei9ja1$3ab$1@sea.gmane.org>

Talin wrote:

> I'm right in the middle of typing up a largish post to go on the 
> Python-3000 mailing list about this issue. Maybe we should move it over 
> there, since its likely that any path reform will have to be targeted at 
> Py3K...?

I'd say that any proposal that cannot be fit into the current 2.X design 
is simply too disruptive to go into 3.0.  So here's my proposal for 2.6 
(reposted from the 3K list).

This is fully backwards compatible, can go right into 2.6 without 
breaking anything, allows people to update their code as they go,
and can be incrementally improved in future releases:

     1) Add a pathname wrapper to "os.path", which lets you do basic
        path "algebra".  This should probably be a subclass of unicode,
        and should *only* contain operations on names.

     2) Make selected "shutil" operations available via the "os" name-
        space; the old POSIX API vs. POSIX SHELL distinction is pretty
        irrelevant.  Also make the os.path predicates available via the
        "os" namespace.

This gives a very simple conceptual model for the user; to manipulate
path *names*, use "os.path.<op>(string)" functions or the "<path>"
wrapper.  To manipulate *objects* identified by a path, given either as
a string or a path wrapper, use "os.<op>(path)".  This can be taught in
less than a minute.

With this in place in 2.6 and 2.7, all that needs to be done for 3.0 is 
to remove (some of) the old cruft.

</F>


From fredrik at pythonware.com  Wed Nov  1 08:53:04 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 01 Nov 2006 08:53:04 +0100
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
In-Reply-To: <ei8pg0$9a5$1@sea.gmane.org>
References: <ei6pde$7mn$1@sea.gmane.org> <4547BF86.6070806@v.loewis.de>
	<ei8pg0$9a5$1@sea.gmane.org>
Message-ID: <ei9jov$4f4$1@sea.gmane.org>

Terry Reedy wrote:

> I believe that at present PyGame can only work with external images that it 
> is programmed to know how to import.  My guess is that if image source 
> program X (such as PIL) described its data layout in a way that NumPy could 
> read and act on, the import/copy step could be eliminated.

I wish you all stopped using PIL as an example in this discussion;
for PIL 2, I'm moving towards an entirely opaque data model, with a 
"data view"-style client API.

</F>


From martin at v.loewis.de  Wed Nov  1 09:16:25 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Nov 2006 09:16:25 +0100
Subject: [Python-Dev] patch 1462525 or similar solution?
In-Reply-To: <20061031230557.656049036@place.org>
References: <20061031230557.656049036@place.org>
Message-ID: <454857D9.2040008@v.loewis.de>

Paul Jimenez schrieb:
> I submitted patch 1462525 awhile back to
> solve the problem described even longer ago in
> http://mail.python.org/pipermail/python-dev/2005-November/058301.html
> and I'm wondering what my appropriate next steps are. Honestly, I don't
> care if you take my patch or someone else's proposed solution, but I'd
> like to see something go into the stdlib so that I can eventually stop
> having to ship custom code for what is really a standard problem.

The problem, as I see it, is that we cannot afford to include an
"incorrect" library *again*. urllib may be ill-designed, but can't
be changed for backwards compatibility reasons. The same should
not happen to urilib: it has to be "right" from the start.

So the question is: are you willing to work on it until it is right?

I just reviewed it a bit, and have a number of questions:
- Can you please sign a contributor form, from
  http://www.python.org/psf/contrib/
  and then add the magic words ("Licensed to PSF under
  a Contributor Agreement.") to this code?

- I notice there is no documentation. Can you please come
  up with a patch to Doc/lib?

- Also, there are no test cases. Can you please come up with
  a test suite?

- Is this library also meant to support creation of URIs?
  If so, shouldn't it also do percent-encoding, if the
  input contains reserved characters. Also, shouldn't
  it perform percent-undecoding when the URI contains
  unreserved characters?

- Should this library support RFC 3987 also?

- Why does the code still name things "URL"? The RFC
  avoids this name throughout (except for explaining
  that the fact that the URI is a locator is really
  irrelevant)

Regards,
Martin

From martin at v.loewis.de  Wed Nov  1 09:24:06 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Nov 2006 09:24:06 +0100
Subject: [Python-Dev] PEP:  Adding data-type objects to Python
In-Reply-To: <loom.20061101T024911-480@post.gmane.org>
References: <ehton4$t7n$1@sea.gmane.org>
	<loom.20061101T024911-480@post.gmane.org>
Message-ID: <454859A6.4050904@v.loewis.de>

Bill Baxter schrieb:
> Basically in my code I want to be able to take the binary data descriptor and
> say "give me the 'r' field of this pixel as an integer".
> 
> Is either one (the PEP or c-types) clearly easier to use in this case?  What
> would the code look like for handling both formats generically?

The PEP, as specified, does not support accessing individual fields from
Python. OTOH, ctypes, as implemented, does. This comparison is not fair,
though: an *implementation* of the PEP (say, NumPy) might also give you
Python-level access to the fields.

With the PEP, you can get access to the 'r' field from C code.
Performing this access is quite tedious; as I'm uncertain whether you
actually wanted to see C code, I refrain from trying to formulate it.

Regards,
Martin

From glyph at divmod.com  Wed Nov  1 09:36:11 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 01 Nov 2006 08:36:11 -0000
Subject: [Python-Dev] Path object design
Message-ID: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com>

On 03:14 am, sluggoster at gmail.com wrote:

>One thing is sure -- we urgently need something better than os.path.
>It functions well but it makes hard-to-read and unpythonic code.

I'm not so sure.  The need is not any more "urgent" today than it was 5 years ago, when os.path was equally "unpythonic" and unreadable.  The problem is real but there is absolutely no reason to hurry to a premature solution.

I've already recommended Twisted's twisted.python.filepath module as a possible basis for the implementation of this feature.  I'm sorry I don't have the time to pursue that.  I'm also sad that nobody else seems to have noticed.  Twisted's implemenation has an advantage that it doesn't seem that these new proposals do, an advantage I would really like to see in whatever gets seriously considered for adoption:

*It is already used in a large body of real, working code, and therefore its limitations are known.*

If I'm wrong about this, and I can't claim to really know about the relative levels of usage of all of these various projects when they're not mentioned, please cite actual experiences using them vs. using os.path.

Proposals for extending the language are contentious and it is very difficult to do experimentation with non-trivial projects because nobody wants to do that and then end up with a bunch of code written in a language that is no longer supported when the experiment fails.  I understand, therefore, that language-change proposals are therefore going to be very contentious no matter what.

However, there is no reason that library changes need to follow this same path.  It is perfectly feasible to write a library, develop some substantial applications with it, tweak it based on that experience, and *THEN* propose it for inclusion in the standard library.  Users of the library can happily continue using the library, whether it is accepted or not, and users of the language and standard library get a new feature for free.  For example, I plan to continue using FilePath regardless of the outcome of this discussion, although perhaps some conversion methods or adapters will be in order if a new path object makes it into the standard library.

I specifically say "library" and not "recipie".  This is not a useful exercise if every user of the library has a subtly incompatible and manually tweaked version for their particular application.

Path representation is a bike shed.  Nobody would have proposed writing an entirely new embedded database engine for Python: python 2.5 simply included SQLite because its utility was already proven.

I also believe it is important to get this issue right.  It might be a bike shed, but it's a *very important* bike shed.  Google for "web server url filesystem path vulnerability" and you'll see what I mean.  Getting it wrong (or passing strings around everywhere) means potential security gotchas lurking around every corner.  Even Twisted, with no C code at all, got its only known arbitrary-code-execution vulnerability from a path manipulation bug.  That was even after we'd switched to an OO path-manipulation layer specifically to avoid bugs like this!

I am not addressing this message to the py3k list because its general message of extreme conservatism on new features is more applicable to python-dev.  However, py3k designers might also take note: if py3k is going to do something in this area and drop support for the "legacy" os.path, it would be good to choose something that is known to work and have few gotchas, rather than just choosing the devil we don't know over the devil we do.  The weaknesses of os.path are at least well-understood.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/38391818/attachment.htm 

From fredrik at pythonware.com  Wed Nov  1 10:11:12 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 01 Nov 2006 10:11:12 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com>
References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com>
Message-ID: <ei9obf$gm4$1@sea.gmane.org>

glyph at divmod.com wrote:

> I am not addressing this message to the py3k list because its general 
> message of extreme conservatism on new features is more applicable to 
> python-dev.  However, py3k designers might also take note: if py3k is 
> going to do something in this area and drop support for the "legacy" 
> os.path, it would be good to choose something that is known to work and 
> have few gotchas, rather than just choosing the devil we don't know over 
> the devil we do.  The weaknesses of os.path are at least well-understood.

that's another reason why a new design might as well be defined in
terms of the old design -- especially if the main goal is call-site 
convenience, rather than fancy new algorithms.

</F>


From ncoghlan at gmail.com  Wed Nov  1 10:41:41 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 01 Nov 2006 19:41:41 +1000
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <ei8084$gm7$1@sea.gmane.org>
References: <fb6fbf560610300956k4a1b50d2r72b44238bd4336f2@mail.gmail.com>	<ei5rnm$pib$1@sea.gmane.org>	<45468C8E.1000203@canterbury.ac.nz>	<ei6aos$52s$2@sea.gmane.org>	<ei6lo7$v1d$1@sea.gmane.org>	<4547452A.5040501@gmail.com>
	<ei8084$gm7$1@sea.gmane.org>
Message-ID: <45486BD5.9060000@gmail.com>

Travis Oliphant wrote:
> Nick Coghlan wrote:
>> In fact, it may make sense to just use the lists/strings directly as the data 
>> exchange format definitions, and let the various libraries do their own 
>> translation into their private format descriptions instead of creating a new 
>> one-type-to-describe-them-all.
> 
> Yes, I'm open to this possibility.   I basically want two things in the 
> object passed through the extended buffer protocol:
> 
> 1) It's fast on the C-level
> 2) It covers all the use-cases.
> 
> If just a particular string or list structure were passed, then I would 
> drop the data-format PEP and just have the dataformat argument of the 
> extended buffer protocol be that thing.
> 
> Then, something that converts ctypes objects to that special format 
> would be very nice indeed.

It may make sense to have a couple distinct sections in the datatype PEP:
  a. describing data formats with basic Python types
  b. a lightweight class for parsing these data format descriptions

It's most of the way there already - part A would just be the various styles 
of arguments accepted by the datatype constructor, and part B would be the 
datatype object itself.

I personally think it makes the most sense to do both, but separating the two 
would make it clear that the descriptions can be standardised without 
*necessarily* defining a new class.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From g.brandl at gmx.net  Wed Nov  1 11:02:39 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 01 Nov 2006 11:02:39 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <ei9ja1$3ab$1@sea.gmane.org>
References: <6e9196d20610311914p6031ad31yb13672bb467815ef@mail.gmail.com>	<45481292.7040403@acm.org>
	<ei9ja1$3ab$1@sea.gmane.org>
Message-ID: <ei9rc1$r5a$1@sea.gmane.org>

Fredrik Lundh wrote:
> Talin wrote:
> 
>> I'm right in the middle of typing up a largish post to go on the 
>> Python-3000 mailing list about this issue. Maybe we should move it over 
>> there, since its likely that any path reform will have to be targeted at 
>> Py3K...?
> 
> I'd say that any proposal that cannot be fit into the current 2.X design 
> is simply too disruptive to go into 3.0.  So here's my proposal for 2.6 
> (reposted from the 3K list).
> 
> This is fully backwards compatible, can go right into 2.6 without 
> breaking anything, allows people to update their code as they go,
> and can be incrementally improved in future releases:
> 
>      1) Add a pathname wrapper to "os.path", which lets you do basic
>         path "algebra".  This should probably be a subclass of unicode,
>         and should *only* contain operations on names.
> 
>      2) Make selected "shutil" operations available via the "os" name-
>         space; the old POSIX API vs. POSIX SHELL distinction is pretty
>         irrelevant.  Also make the os.path predicates available via the
>         "os" namespace.
> 
> This gives a very simple conceptual model for the user; to manipulate
> path *names*, use "os.path.<op>(string)" functions or the "<path>"
> wrapper.  To manipulate *objects* identified by a path, given either as
> a string or a path wrapper, use "os.<op>(path)".  This can be taught in
> less than a minute.

+1. This is really straightforward and easy to learn.

I have been a supporter of the full-blown Path object in the past, but the
recent discussions have convinved me that it is just too big and too confusing,
and that you can't kill too many birds with one stone in this respect.
Most of the ugliness really lies in the path name manipulation functions, which
nicely map to methods on a path name object.

Georg


From g.brandl at gmx.net  Wed Nov  1 11:06:14 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 01 Nov 2006 11:06:14 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com>
References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com>
Message-ID: <ei9rio$sd9$1@sea.gmane.org>

glyph at divmod.com wrote:
> On 03:14 am, sluggoster at gmail.com wrote:
> 
>  >One thing is sure -- we urgently need something better than os.path.
>  >It functions well but it makes hard-to-read and unpythonic code.
> 
> I'm not so sure.  The need is not any more "urgent" today than it was 5 
> years ago, when os.path was equally "unpythonic" and unreadable.  The 
> problem is real but there is absolutely no reason to hurry to a 
> premature solution.
> 
> I've already recommended Twisted's twisted.python.filepath module as a 
> possible basis for the implementation of this feature.  I'm sorry I 
> don't have the time to pursue that.  I'm also sad that nobody else seems 
> to have noticed.  Twisted's implemenation has an advantage that it 
> doesn't seem that these new proposals do, an advantage I would really 
> like to see in whatever gets seriously considered for adoption:

Looking at 
<http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html>,
it seems as if FilePath was made to serve a different purpose than what we're
trying to discuss here:

"""
I am a path on the filesystem that only permits 'downwards' access.

Instantiate me with a pathname (for example, 
FilePath('/home/myuser/public_html')) and I will attempt to only provide access 
to files which reside inside that path. [...]

The correct way to use me is to instantiate me, and then do ALL filesystem 
access through me.
"""

What a successor to os.path needs is not security, it's a better (more pythonic,
if you like) interface to the old functionality.

Georg


From ncoghlan at gmail.com  Wed Nov  1 11:16:02 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 01 Nov 2006 20:16:02 +1000
Subject: [Python-Dev] patch 1462525 or similar solution?
In-Reply-To: <20061031230557.656049036@place.org>
References: <20061031230557.656049036@place.org>
Message-ID: <454873E2.3060308@gmail.com>

Paul Jimenez wrote:
> I submitted patch 1462525 awhile back to
> solve the problem described even longer ago in
> http://mail.python.org/pipermail/python-dev/2005-November/058301.html
> and I'm wondering what my appropriate next steps are. Honestly, I don't
> care if you take my patch or someone else's proposed solution, but I'd
> like to see something go into the stdlib so that I can eventually stop
> having to ship custom code for what is really a standard problem.

Something that has been lurking on my to-do list for the past year(!) is to 
get the urischemes module I wrote based on your uriparse module off the Python 
patch tracker [1] and into the cheese shop somewhere.

It already has limited documentation in the form of docstrings with doctest 
examples (although the non-doctest examples in the module docstring still need 
to be fixed), and there are a whole barrel tests in the _test() function which 
could be converted to unittest fairly easily.

The reason I'd like to see something in the cheese shop rather than going 
straight into the standard library is that:
   1. It may help people now, rather than in 18-24 months when 2.6 comes out
   2. The module can see some real world usage to firm up the API before we 
commit to it for the standard lib (if it gets added at all)

That said, I don't see myself finding the roundtuits to publish and promote 
this anytime soon :(

Cheers,
Nick.

[1]
http://sourceforge.net/tracker/?func=detail&aid=1500504&group_id=5470&atid=305470


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From anthony at python.org  Wed Nov  1 11:50:32 2006
From: anthony at python.org (Anthony Baxter)
Date: Wed, 1 Nov 2006 21:50:32 +1100
Subject: [Python-Dev] RELEASED Python 2.3.6, FINAL
Message-ID: <200611012150.44644.anthony@python.org>

On behalf of the Python development team and the Python
community, I'm happy to announce the release of Python 2.3.6
(FINAL).

Python 2.3.6 is a security bug-fix release. While Python 2.5
is the latest version of Python, we're making this release for
people who are still running Python 2.3. Unlike the recently
released 2.4.4, this release only contains a small handful of
security-related bugfixes. See the website for more.

*  Python 2.3.6 contains a fix for PSF-2006-001, a buffer overrun
*  in repr() of unicode strings in wide unicode (UCS-4) builds.
*  See http://www.python.org/news/security/PSF-2006-001/ for more.

This is a **source only** release. The Windows and Mac binaries
of 2.3.5 were built with UCS-2 unicode, and are therefore not
vulnerable to the problem outlined in PSF-2006-001. The PCRE fix
is for a long-deprecated module (you should use the 're' module
instead) and the email fix can be obtained by downloading the
standalone version of the email package.

Most vendors who ship Python should have already released a
patched version of 2.3.5 with the above fixes, this release is
for people who need or want to build their own release, but don't
want to mess around with patch or svn.

There have been no changes (apart from the version number) since the
release candidate of 2.3.6.

Python 2.3.6 will complete python.org's response to PSF-2006-001.
If you're still on Python 2.2 for some reason and need to work
with UCS-4 unicode strings, please obtain the patch from the
PSF-2006-001 security advisory page. Python 2.4.4 and Python 2.5
have both already been released and contain the fix for this
security problem.

For more information on Python 2.3.6, including download links
for source archives, release notes, and known issues, please see:

    http://www.python.org/2.3.6

Highlights of this new release include:

  - A fix for PSF-2006-001, a bug in repr() for unicode strings 
    on UCS-4 (wide unicode) builds.
  - Two other, less critical, security fixes.

Enjoy this release,
Anthony

Anthony Baxter
anthony at python.org
Python Release Manager
(on behalf of the entire python-dev team)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061101/cb5f29ef/attachment-0001.pgp 

From jml at mumak.net  Wed Nov  1 12:57:43 2006
From: jml at mumak.net (Jonathan Lange)
Date: Wed, 1 Nov 2006 22:57:43 +1100
Subject: [Python-Dev] Path object design
In-Reply-To: <ei9rio$sd9$1@sea.gmane.org>
References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com>
	<ei9rio$sd9$1@sea.gmane.org>
Message-ID: <d06a5cd30611010357s18c154a9jf1454d12cbf5d540@mail.gmail.com>

On 11/1/06, Georg Brandl <g.brandl at gmx.net> wrote:
> glyph at divmod.com wrote:
> > On 03:14 am, sluggoster at gmail.com wrote:
> >
> >  >One thing is sure -- we urgently need something better than os.path.
> >  >It functions well but it makes hard-to-read and unpythonic code.
> >
> > I'm not so sure.  The need is not any more "urgent" today than it was 5
> > years ago, when os.path was equally "unpythonic" and unreadable.  The
> > problem is real but there is absolutely no reason to hurry to a
> > premature solution.
> >
> > I've already recommended Twisted's twisted.python.filepath module as a
> > possible basis for the implementation of this feature.  I'm sorry I
> > don't have the time to pursue that.  I'm also sad that nobody else seems
> > to have noticed.  Twisted's implemenation has an advantage that it
> > doesn't seem that these new proposals do, an advantage I would really
> > like to see in whatever gets seriously considered for adoption:
>
> Looking at
> <http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html>,
> it seems as if FilePath was made to serve a different purpose than what we're
> trying to discuss here:
>
> """
> I am a path on the filesystem that only permits 'downwards' access.
>
> Instantiate me with a pathname (for example,
> FilePath('/home/myuser/public_html')) and I will attempt to only provide access
> to files which reside inside that path. [...]
>
> The correct way to use me is to instantiate me, and then do ALL filesystem
> access through me.
> """
>
> What a successor to os.path needs is not security, it's a better (more pythonic,
> if you like) interface to the old functionality.
>

Then let us discuss that. Is FilePath actually a better interface to
the old functionality? Even if it was designed to solve a security
problem, it might prove to be an extremely useful general interface.

jml

From fredrik at pythonware.com  Wed Nov  1 13:10:21 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 01 Nov 2006 13:10:21 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <d06a5cd30611010357s18c154a9jf1454d12cbf5d540@mail.gmail.com>
References: <20061101083611.14394.883770762.divmod.xquotient.48@joule.divmod.com>	<ei9rio$sd9$1@sea.gmane.org>
	<d06a5cd30611010357s18c154a9jf1454d12cbf5d540@mail.gmail.com>
Message-ID: <eia2rd$ju7$1@sea.gmane.org>

Jonathan Lange wrote:

> Then let us discuss that.

Glyph's references to bike sheds went right over your head, right?

</F>


From exarkun at divmod.com  Wed Nov  1 15:09:48 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Wed, 1 Nov 2006 09:09:48 -0500
Subject: [Python-Dev] Path object design
In-Reply-To: <ei9rio$sd9$1@sea.gmane.org>
Message-ID: <20061101140948.20948.1876757841.divmod.quotient.8952@ohm>

On Wed, 01 Nov 2006 11:06:14 +0100, Georg Brandl <g.brandl at gmx.net> wrote:
>glyph at divmod.com wrote:
>> On 03:14 am, sluggoster at gmail.com wrote:
>>
>>  >One thing is sure -- we urgently need something better than os.path.
>>  >It functions well but it makes hard-to-read and unpythonic code.
>>
>> I'm not so sure.  The need is not any more "urgent" today than it was 5
>> years ago, when os.path was equally "unpythonic" and unreadable.  The
>> problem is real but there is absolutely no reason to hurry to a
>> premature solution.
>>
>> I've already recommended Twisted's twisted.python.filepath module as a
>> possible basis for the implementation of this feature.  I'm sorry I
>> don't have the time to pursue that.  I'm also sad that nobody else seems
>> to have noticed.  Twisted's implemenation has an advantage that it
>> doesn't seem that these new proposals do, an advantage I would really
>> like to see in whatever gets seriously considered for adoption:
>
>Looking at
><http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html>,
>it seems as if FilePath was made to serve a different purpose than what we're
>trying to discuss here:
>
>"""
>I am a path on the filesystem that only permits 'downwards' access.
>
>Instantiate me with a pathname (for example,
>FilePath('/home/myuser/public_html')) and I will attempt to only provide access
>to files which reside inside that path. [...]
>
>The correct way to use me is to instantiate me, and then do ALL filesystem
>access through me.
>"""
>
>What a successor to os.path needs is not security, it's a better (more pythonic,
>if you like) interface to the old functionality.

No.  You've misunderstood the code you looked at.  FilePath serves exactly
the purpose being discussed here.  Take a closer look.

Jean-Paul

From wbaxter at gmail.com  Wed Nov  1 16:41:06 2006
From: wbaxter at gmail.com (Bill Baxter)
Date: Wed, 1 Nov 2006 15:41:06 +0000 (UTC)
Subject: [Python-Dev] PEP:  Adding data-type objects to Python
References: <ehton4$t7n$1@sea.gmane.org>
	<loom.20061101T024911-480@post.gmane.org>
	<454859A6.4050904@v.loewis.de>
Message-ID: <loom.20061101T161013-877@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:

> 
> Bill Baxter schrieb:
> > Basically in my code I want to be able to take the binary data descriptor and
> > say "give me the 'r' field of this pixel as an integer".
> > 
> > Is either one (the PEP or c-types) clearly easier to use in this case? 
> > What
> > would the code look like for handling both formats generically?
> 
> The PEP, as specified, does not support accessing individual fields from
> Python. OTOH, ctypes, as implemented, does. This comparison is not fair,
> though: an *implementation* of the PEP (say, NumPy) might also give you
> Python-level access to the fields.

I see.  So at the Python-user convenience level it's pretty much a wash.  Are
there significant differences in memory usage and/or performance?  ctypes 
sounds to be more heavyweight from the discussion.  If I have a lot of image
formats I want to support is that going to mean lots of overhead with ctypes?
Do I pay for it whether or not I actually end up having to handle an image in a
given format?

> With the PEP, you can get access to the 'r' field from C code.
> Performing this access is quite tedious; as I'm uncertain whether you
> actually wanted to see C code, I refrain from trying to formulate it.

Actually this is more what I was after.  I've written C code to interface with
Numpy arrays and found it to be not so bad.  But the data I was passing around
was just a plain N-dimensional array of doubles.  Very basic.  It *sounds* like
what Travis is saying is that handling a less simple case, like the one above
of supporting a variety of RGB image formats, would be easier with the PEP than
with ctypes.  Or maybe it's generating the data in my C code that's trickier,
as opposed to consuming it?

I'm just trying to understand what the deal is, and at the same time perhaps
inject a more concrete example into the discussion. Travis has said several
times that working with ctypes, which requires a Python type per 'element', is
more complicated from the C side, and I'd like to see more concretely how so,
as someone who may end up needing to write such code.

And I'm ok without seeing the actual code if someone can actually answer my
question.  The question is not whether it is tedious or not -- everything about
the Python C API is tedious from what I've seen.  The question is which is
*more* tedious, and how significan is the difference in tediousness to the guy
who's job it is to actually write the code.

--bb


From oliphant.travis at ieee.org  Wed Nov  1 17:06:00 2006
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Wed, 01 Nov 2006 09:06:00 -0700
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
In-Reply-To: <ei9jov$4f4$1@sea.gmane.org>
References: <ei6pde$7mn$1@sea.gmane.org>
	<4547BF86.6070806@v.loewis.de>	<ei8pg0$9a5$1@sea.gmane.org>
	<ei9jov$4f4$1@sea.gmane.org>
Message-ID: <eiaghg$cjj$1@sea.gmane.org>

Fredrik Lundh wrote:
> Terry Reedy wrote:
> 
>> I believe that at present PyGame can only work with external images that it 
>> is programmed to know how to import.  My guess is that if image source 
>> program X (such as PIL) described its data layout in a way that NumPy could 
>> read and act on, the import/copy step could be eliminated.
> 
> I wish you all stopped using PIL as an example in this discussion;
> for PIL 2, I'm moving towards an entirely opaque data model, with a 
> "data view"-style client API.

That's an un-reasonable request.  The point of the buffer protocol 
allows people to represent their data in whatever way they like 
internally but still share it in a standard way.  The extended buffer 
protocol allows sharing of the shape of the data and its format in a 
standard way as well.

We just want to be able to convert the data in PIL objects to other 
Python objects without having to write special "converter" functions. 
It's not important how PIL or PIL 2 stores the data as long as it 
participates in the buffer protocol.

Of course if the memory layout were compatible with the model of NumPy, 
then data-copies would not be required, but that is really secondary.

-Travis


From glyph at divmod.com  Wed Nov  1 17:09:10 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 01 Nov 2006 16:09:10 -0000
Subject: [Python-Dev] Path object design
Message-ID: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com>

On 10:06 am, g.brandl at gmx.net wrote:
>What a successor to os.path needs is not security, it's a better (more pythonic,
>if you like) interface to the old functionality.

Why?

I assert that it needs a better[1] interface because the current interface can lead to a variety of bugs through idiomatic, apparently correct usage.  All the more because many of those bugs are related to critical errors such as security and data integrity.

If I felt the current interface did a good job at doing the right thing in the right situation, but was cumbersome to use, I would strenuously object to _any_ work taking place to change it.  This is a hard API to get right.

[1]: I am rather explicitly avoiding the word "pythonic" here.  It seems to have grown into a shibboleth (and its counterpart, "unpythonic", into an expletive).  I have the impression it used to mean something a bit more specific, maybe adherence to Tim Peters' "Zen" (although that was certainly vague enough by itself and not always as self-evidently true as some seem to believe).  More and more, now, though, I hear it used to mean 'stuff should be more betterer!' and then everyone nods sagely because we know that no filthy *java* programmer wants things to be more betterer; *we* know *they* want everything to be horrible.  Words like this are a pet peeve of mine though, so perhaps I am overstating the case.  Anyway, moving on... as long as I brought up the Zen, perhaps a particular couplet is appropriate here:

  Now is better than never.
  Although never is often better than *right* now.

Rushing to a solution to a non-problem, e.g. the "pythonicness" of the interface, could exacerbate a very real problem, e.g. the security and data-integrity implications of idiomatic usage.  Granted, it would be hard to do worse than os.path, but it is by no means impossible (just look at any C program!), and I can think of a couple of kinds of API which would initially appear more convenient but actually prove more problematic over time.

That brings me back to my original point: the underlying issue here is too important a problem to get wrong *again* on the basis of a superficial "need" for an API that is "better" in some unspecified way.  os.path is at least possible to get right if you know what you're doing, which is no mean feat; there are many path-manipulation libraries in many languages which cannot make that claim (especially portably).  Its replacement might not be.  Getting this wrong outside the standard library might create problems for some people, but making it worse _in_ the standard library could create a total disaster for everyone.

I do believe that this wouldn't get past the dev team (least of all the release manager) but it would waste a lot less of everyone's time if we focused the inevitable continuing bike-shed discussion along the lines of discussing the known merits of widely deployed alternative path libraries, or at least an approach to *get* that data on some new code if there is consensus that existing alternatives are in some way inadequate.

If for some reason it _is_ deemed necessary to go with an untried approach, I can appreciate the benefits that /F has proposed of trying to base the new interface entirely and explicitly off the old one.  At least that way it will still definitely be possible to get right.  There are problems with that too, but they are less severe.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/0f4b5a01/attachment.html 

From oliphant.travis at ieee.org  Wed Nov  1 17:44:27 2006
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Wed, 01 Nov 2006 09:44:27 -0700
Subject: [Python-Dev] idea for data-type (data-format) PEP
Message-ID: <eiaipj$l5s$1@sea.gmane.org>


Thanks for all the comments that have been given on the data-type 
(data-format) PEP.  I'd like opinions on an idea for revising the PEP I 
have.

What if we look at this from the angle of trying to communicate 
data-formats between different libraries (not change the way anybody 
internally deals with data-formats).

For example, ctypes has one way to internally deal with data-formats 
(using type objects).

NumPy/Numeric has a way to internally deal with data-formats (using 
PyArray_Descr * structure -- in Numeric it's just a C-structure but in 
NumPy it's fleshed out further and also a Python object called the 
data-type).

Numarray has a way to internally deal with data-formats (using type 
objects).

The array module has a way to internally deal with data-formats (using a 
PyArray_Descr * structure -- and character codes to select one).

The struct module deals with data-formats using character codes.

The PIL deals with data-formats using image modes.

PyVTK deals with data-formats using it's own internal objects.

MPI deals with data-formats using it's own MPI_DataType structures.

This list goes on and on.

What I claim is needed in Python (to make it better glue) is to have a 
standard way to communicate data-format information between these 
extensions.  Then, you don't have to build in support for all the 
different ways data-formats are represented by different libraries.  The 
library only has to be able to translate their representation to the 
standard way that Python uses to represent data-format.

How is this goal going to be achieved?  That is the real purpose of the 
data-type object I previously proposed.

Nick showed that there are two (non-orthogonal) ways to think about this 
goal.

1) We could define a special string-syntax (or list syntax) that covers 
every special case.  The array interface specification goes this 
direction and it requires no new Python types.  This could also be seen 
as an extension of the "struct" module to allow for nested structures, etc.

2) We could define a Python object that specifically carries data-format 
information.


There is also a third way (or really 2b) that has been mentioned:  take 
one of the extensions and use what it does to communicate data-format 
between objects and require all other extensions to conform to that 
standard.

The problem with 2b is that what works inside an extension module may 
not be the best option when it comes to communicating across multiple 
extension modules.   Certainly none of the extension modules have argued 
that case effectively.

Does that explain the goal of what I'm trying to do better?


From oliphant.travis at ieee.org  Wed Nov  1 17:58:05 2006
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Wed, 01 Nov 2006 09:58:05 -0700
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <eiaipj$l5s$1@sea.gmane.org>
References: <eiaipj$l5s$1@sea.gmane.org>
Message-ID: <eiajj5$o9a$1@sea.gmane.org>

Travis E. Oliphant wrote:
> Thanks for all the comments that have been given on the data-type 
> (data-format) PEP.  I'd like opinions on an idea for revising the PEP I 
> have.

> 
> 1) We could define a special string-syntax (or list syntax) that covers 
> every special case.  The array interface specification goes this 
> direction and it requires no new Python types.  This could also be seen 
> as an extension of the "struct" module to allow for nested structures, etc.
> 
> 2) We could define a Python object that specifically carries data-format 
> information.
> 
> 
> Does that explain the goal of what I'm trying to do better?

In other-words, what I'm saying is I really want a PEP that does this. 
Could we have a discussion about what the best way to communicate 
data-format information across multiple extension modules would look 
like.  I'm not saying my (pre-)PEP is best.  The point of putting it in 
it's infant state out there is to get the discussion rolling, not to 
claim I've got all the answers.

It seems like there are enough people who have dealt with this issue 
that we ought to be able to put something very useful together that 
would make Python much better glue.


-Travis


From martin at v.loewis.de  Wed Nov  1 18:48:45 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Nov 2006 18:48:45 +0100
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <eiaipj$l5s$1@sea.gmane.org>
References: <eiaipj$l5s$1@sea.gmane.org>
Message-ID: <4548DDFD.5030604@v.loewis.de>

Travis E. Oliphant schrieb:
> What if we look at this from the angle of trying to communicate 
> data-formats between different libraries (not change the way anybody 
> internally deals with data-formats).

ISTM that this is not the right approach. If the purpose of the datatype
object is just to communicate the layout in the extended buffer
interface, then it should be specified in that PEP, rather than being
stand-alone, and it should not pretend to serve any other purpose.
Or, if it does have uses independent of the buffer extension: what
are those uses?

> 1) We could define a special string-syntax (or list syntax) that covers 
> every special case.  The array interface specification goes this 
> direction and it requires no new Python types.  This could also be seen 
> as an extension of the "struct" module to allow for nested structures, etc.
> 
> 2) We could define a Python object that specifically carries data-format 
> information.

To distinguish between these, convenience of usage (and of construction)
should have to be taken into account. At least for the preferred
alternative, but better for the runners-up, too, there should be a
demonstration on how existing modules have to be changed to support it
(e.g. for the struct and array modules as producers; not sure what
good consumer code would be).

Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B),
what would the C code look like that does that? (it seems now that the
primary purpose of this machinery is image manipulation)

> The problem with 2b is that what works inside an extension module may 
> not be the best option when it comes to communicating across multiple 
> extension modules.   Certainly none of the extension modules have argued 
> that case effectively.

I think there are two ways in which one option could be "better" than
the other: it might be more expressive, and it might be easier to use.
For the second aspect (ease of use), there are two subways: it might
be easier to produce, or it might be easier to consume.

Regards,
Martin

From g.brandl at gmx.net  Wed Nov  1 19:04:27 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 01 Nov 2006 19:04:27 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com>
References: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com>
Message-ID: <eianjd$894$1@sea.gmane.org>

glyph at divmod.com wrote:
> On 10:06 am, g.brandl at gmx.net wrote:
>  >What a successor to os.path needs is not security, it's a better (more 
> pythonic,
>  >if you like) interface to the old functionality.
> 
> Why?
> 
> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

AFAICS, people just want an interface that is easier to use and feels more...
err... (trying to avoid the p-word). I've never seen security arguments
being made in this discussion.

> If I felt the current interface did a good job at doing the right thing 
> in the right situation, but was cumbersome to use, I would strenuously 
> object to _any_ work taking place to change it.  This is a hard API to 
> get right.

Well, it's hard to change any running system with that attitude. It doesn't
have to be changed if nobody comes up with something that's agreed (*) to
be better.

(*) agreed in the c.l.py sense, of course

Georg


From fredrik at pythonware.com  Wed Nov  1 19:14:15 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 01 Nov 2006 19:14:15 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com>
References: <20061101160910.14394.707767696.divmod.xquotient.178@joule.divmod.com>
Message-ID: <eiao5n$adh$1@sea.gmane.org>

glyph at divmod.com wrote:

> I assert that it needs a better[1] interface because the current 
> interface can lead to a variety of bugs through idiomatic, apparently 
> correct usage.  All the more because many of those bugs are related to 
> critical errors such as security and data integrity.

instead of referring to some esoteric knowledge about file systems that 
us non-twisted-using mere mortals may not be evolved enough to under- 
stand, maybe you could just make a list of common bugs that may arise 
due to idiomatic use of the existing primitives?

I promise to make a nice FAQ entry out of it, with proper attribution.

</F>


From jimjjewett at gmail.com  Wed Nov  1 19:17:42 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 1 Nov 2006 13:17:42 -0500
Subject: [Python-Dev] PEP: Adding data-type objects to Python
Message-ID: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>

I'm still not sure exactly what is missing from ctypes.  To make this concrete:

You have an array of 500 elements meeting

struct {
      int  simple;
      struct nested {
           char name[30];
           char addr[45];
           int  amount;
      }

ctypes can describe this as

    class nested(Structure):
        _fields_ = [("name", c_char*30),
                    ("addr", c_char*45),
                    ("amount", c_long)]

    class struct(Structure):
        _fields_ = [("simple", c_int), ("nested", nested)]

    desc = struct * 500

You have said that creating whole classes is too much overhead, and
the description should only be an instance.  To me, that particular
class (arrays of 500 structs) still looks pretty lightweight.  So
please clarify when it starts to be a problem.

(1)  For simple types -- mapping
           char name[30];  ==> ("name", c_char*30)

Do you object to using the c_char type?
Do you object to the array-of-length-30 class, instead of just having
a repeat or shape attribute?
Do you object to naming the field?

(2)  For the complex types, nested and struct

Do you object to creating these two classes even once?   For example,
are you expecting to need different classes for each buffer, and to
have many buffers created quickly?

Is creating that new class a royal pain, but frequent (and slow)
enough that you can't just make a call into python (or ctypes)?

(3)  Given that you will describe X, is X*500 (==> a type describing
an array of 500 Xs) a royal pain in C?  If so, are you expecting to
have to do it dynamically for many sizes, and quickly enough that you
can't just let ctypes do it for you?

-jJ

From oliphant.travis at ieee.org  Wed Nov  1 19:30:07 2006
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Wed, 01 Nov 2006 11:30:07 -0700
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <4548DDFD.5030604@v.loewis.de>
References: <eiaipj$l5s$1@sea.gmane.org> <4548DDFD.5030604@v.loewis.de>
Message-ID: <eiaovn$dar$1@sea.gmane.org>

Martin v. L?wis wrote:
> Travis E. Oliphant schrieb:
>> What if we look at this from the angle of trying to communicate 
>> data-formats between different libraries (not change the way anybody 
>> internally deals with data-formats).
> 
> ISTM that this is not the right approach. If the purpose of the datatype
> object is just to communicate the layout in the extended buffer
> interface, then it should be specified in that PEP, rather than being
> stand-alone, and it should not pretend to serve any other purpose.

I'm actually quite fine with that.  If that is the consensus, then I 
will just go that direction.   ISTM though that since we are putting 
forth the trouble inside the extended buffer protocol we might as well 
be as complete as we know how to be.

> Or, if it does have uses independent of the buffer extension: what
> are those uses?

So that NumPy and ctypes and audio libraries and video libraries and 
database libraries and image-file format libraries can communicate about 
data-formats using the same expressions (in Python).

Maybe we decide that ctypes-based expressions are a very good way to 
communicate about those things in Python for all other packages.  If 
that is the case, then I argue that we ought to change the array module, 
and the struct module to conform (of course keeping the old ways for 
backward compatibility) and set the standard for other packages to follow.

What problem do you have in defining a standard way to communicate about 
binary data-formats (not just images)?  I still can't figure out why you 
are so resistant to the idea.  MPI had to do it.

> 
>> 1) We could define a special string-syntax (or list syntax) that covers 
>> every special case.  The array interface specification goes this 
>> direction and it requires no new Python types.  This could also be seen 
>> as an extension of the "struct" module to allow for nested structures, etc.
>>
>> 2) We could define a Python object that specifically carries data-format 
>> information.
> 
> To distinguish between these, convenience of usage (and of construction)
> should have to be taken into account. At least for the preferred
> alternative, but better for the runners-up, too, there should be a
> demonstration on how existing modules have to be changed to support it
> (e.g. for the struct and array modules as producers; not sure what
> good consumer code would be).

Absolutely --- if something is to be made useful across packages and 
from Python.   This is where the discussion should take place.  The 
struct module and array modules would both be consumers also so that in 
the struct module you could specify your structure in terms of the 
standard data-represenation and in the array module you could specify 
your array in terms of the standard representation instead of using 
"character codes".

> 
> Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B),
> what would the C code look like that does that? (it seems now that the
> primary purpose of this machinery is image manipulation)
> 

For me it is definitely not image manipulation that is the only purpose 
(or even the primary purpose).  It's just an easy one to explain --- 
most people understand images).   But, I think this question is actually 
irrelevant (IMHO).  To me, how you change all RGB values to gray would 
depend on the library you are using not on how data-formats are expressed.

Maybe we are still mis-understanding each other.


If you really want to know.  In NumPy it might look like this:

Python code:

img['r'] = img['g']
img['b'] = img['g']

C-code:

use the Python C-API to do essentially the same thing as above or

to do
img['r'] = img['g']

dtype = img->descr;
r_field = PyDict_GetItemString(dtype,'r');
g_field = PyDict_GetItemString(dtype,'g');
r_field_dtype = PyTuple_GET_ITEM(r_field, 0);
r_field_offset = PyTuple_GET_ITEM(r_field, 1);
g_field_dtype = PyTuple_GET_ITEM(g_field, 0);
g_field_offset = PyTuple_GET_ITEM(g_field, 1);
obj = PyArray_GetField(img, g_field, g_field_offset);
Py_INCREF(r_field)
PyArray_SetField(img, r_field, r_field_offset, obj);

But, I still don't see how that is relevant to the question of how to 
represent the data-format to share that information across two extensions.


>> The problem with 2b is that what works inside an extension module may 
>> not be the best option when it comes to communicating across multiple 
>> extension modules.   Certainly none of the extension modules have argued 
>> that case effectively.
> 
> I think there are two ways in which one option could be "better" than
> the other: it might be more expressive, and it might be easier to use.
> For the second aspect (ease of use), there are two subways: it might
> be easier to produce, or it might be easier to consume.

I like this as a means to judge a data-format representation. Let me 
summarize to see if I understand:

1) Expressive (does it express every data-format you might want or need)
2) Ease of use
    a) Production: How easy is it to create the representation.
    b) Consumption:  How easy is it to interpret the representation.


-Travis


From brett at python.org  Wed Nov  1 20:17:56 2006
From: brett at python.org (Brett Cannon)
Date: Wed, 1 Nov 2006 11:17:56 -0800
Subject: [Python-Dev] [Tracker-discuss] Getting Started
In-Reply-To: <4548F1FD.5010505@sympatico.ca>
References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca>
	<50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com>
	<200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca>
	<bbaeab100611011105o5cf20956vcbd19fd48a2ec68b@mail.gmail.com>
	<4548F1FD.5010505@sympatico.ca>
Message-ID: <bbaeab100611011117j3b898bd7ocd01bfa12c7f7846@mail.gmail.com>

On 11/1/06, Stefan Seefeld <seefeld at sympatico.ca> wrote:
>
> Brett Cannon wrote:
> > On 11/1/06, Stefan Seefeld <seefeld at sympatico.ca> wrote:
>
> >> Right. Brett, do we need accounts on python.org for this ?
> >
> >
> > Yep.  It just requires SSH 2 keys from each of you.  You can then email
> > python-dev with those keys and your first.last name and someone there
> will
> > install the keys for you.
>
> My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan Seefeld.
>
> Thanks !


Just to clarify, this is not for pydotorg but the svn.python.org.  The
admins for our future Roundup instance are going to keep their Roundup code
in svn so they need commit access.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/e2947c52/attachment.html 

From oliphant.travis at ieee.org  Wed Nov  1 19:50:16 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 11:50:16 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
Message-ID: <4548EC68.1020505@ieee.org>

Jim Jewett wrote:
> I'm still not sure exactly what is missing from ctypes.  To make this 
> concrete:

I think the only thing missing from ctypes "expressiveness" as far as I 
can tell in terms of what you "can" do is the byte-order representation.

What is missing is ease-of use for producers and consumers in 
interpreting the data-type.   When I speak of Producers and consumers, 
I'm largely talking about C-code (or Java or .NET) code writers.

Producers must basically use Python code to create classes of various 
types.   This is going to be slow in 'C'.  Probably slower than the 
array interface (which is what we have no informally).

Consumers are going to have a hard time interpreting the result.  I'm 
not even sure how to do that, in fact.  I'd like NumPy to be able to 
understand ctypes as a means to specify data.  Would I have to check 
against all the sub-types of CDataType, pull out the fields, check the 
tp_name of the type object?  I'm not sure.

It seems like a string with the C-structure would be better as a 
data-representation, but then a third-party library would want to parse 
that so that Python might as well have it's own parser for data-types. 

So, Python might as well have it's own way to describe data.  My claim 
is this default way should *not* be overloaded by using Python 
type-objects (the ctypes way).  I'm making a claim that the NumPy way of 
using a different Python object to describe data-types.  I'm not saying 
the NumPy object should be used.  I'm saying we should come up with a 
singe DataFormatType whose instances express the data formats in ways 
that other packages can produce and consume (or even use internally).  

It would be easy for NumPy to "use" the default Python object in it's 
PyArray_Descr * structure.  It would also be easy for ctypes to "use" 
the default Python object in its StgDict object that is the tp_dict of 
every ctypes type object.

It would be easy for the struct module to allow for this data-format 
object (instead of just strings) in it's methods. 

It would be easy for the array module to accept this data-format object 
(instead of just typecodes) in it's constructor.

Lot's of things would suddenly be more consistent throughout both the 
Python and C-Python user space.

Perhaps after discussion, it becomes clear that the ctypes approach is 
sufficient to be "that thing" that all modules use to share data-format 
information.  It's definitely expressive enough.   But, my argument is 
that NumPy data-type objects are also "pretty close." so why should they 
be rejected.  We could also make a "string-syntax" do it.

>
> You have said that creating whole classes is too much overhead, and
> the description should only be an instance.  To me, that particular
> class (arrays of 500 structs) still looks pretty lightweight.  So
> please clarify when it starts to be a problem.
>

> (1)  For simple types -- mapping
>           char name[30];  ==> ("name", c_char*30)
>
> Do you object to using the c_char type?
> Do you object to the array-of-length-30 class, instead of just having
> a repeat or shape attribute?
> Do you object to naming the field?
>
> (2)  For the complex types, nested and struct
>
> Do you object to creating these two classes even once?   For example,
> are you expecting to need different classes for each buffer, and to
> have many buffers created quickly?
I object to the way I "consume" and "produce" the ctypes interface.  
It's much to slow to be used on the C-level for sharing many small 
buffers quickly.
>
> Is creating that new class a royal pain, but frequent (and slow)
> enough that you can't just make a call into python (or ctypes)?
>
> (3)  Given that you will describe X, is X*500 (==> a type describing
> an array of 500 Xs) a royal pain in C?  If so, are you expecting to
> have to do it dynamically for many sizes, and quickly enough that you
> can't just let ctypes do it for you?

That pretty much sums it up (plus the pain of having to basically write 
Python code from "C").

-Travis


From jimjjewett at gmail.com  Wed Nov  1 20:35:33 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Wed, 1 Nov 2006 14:35:33 -0500
Subject: [Python-Dev] Path object design
Message-ID: <fb6fbf560611011135j474c903qa7a2e1b1995eacc0@mail.gmail.com>

On 10:06 am, g.brandl at gmx.net wrote:
>> What a successor to os.path needs is not security, it's a better
(more pythonic,
>> if you like) interface to the old functionality.

Glyph:

> Why?

> Rushing ... could exacerbate a very real problem, e.g.
> the security and data-integrity implications of idiomatic usage.

The proposed Path object (or new path module) is intended to replace
os.path.  If it can't do the equivalent of "cd ..", then it isn't a
replacement; it is just another similar alternative to confuse
beginners.

If you're saying that a webserver should use a more restricted
subclass (or even the existing FilePath alternative), then I agree.
I'll even agree that a restricted version would ideally be available
out of the box.  I don't think it should be the only option.

-jJ

From brett at python.org  Wed Nov  1 20:36:56 2006
From: brett at python.org (Brett Cannon)
Date: Wed, 1 Nov 2006 11:36:56 -0800
Subject: [Python-Dev] [Tracker-discuss] Getting Started
In-Reply-To: <87slh3vuk0.fsf@uterus.efod.se>
References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca>
	<50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com>
	<200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca>
	<bbaeab100611011105o5cf20956vcbd19fd48a2ec68b@mail.gmail.com>
	<4548F1FD.5010505@sympatico.ca>
	<bbaeab100611011117j3b898bd7ocd01bfa12c7f7846@mail.gmail.com>
	<87slh3vuk0.fsf@uterus.efod.se>
Message-ID: <bbaeab100611011136s20135bd8ma74e14b636a7a2a2@mail.gmail.com>

On 11/1/06, Erik Forsberg <forsberg at efod.se> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> "Brett Cannon" <brett at python.org> writes:
>
> > On 11/1/06, Stefan Seefeld <seefeld at sympatico.ca> wrote:
> >>
> >> Brett Cannon wrote:
> >> > On 11/1/06, Stefan Seefeld <seefeld at sympatico.ca> wrote:
> >>
> >> >> Right. Brett, do we need accounts on python.org for this ?
> >> >
> >> >
> >> > Yep.  It just requires SSH 2 keys from each of you.  You can then
> email
> >> > python-dev with those keys and your first.last name and someone there
> >> will
> >> > install the keys for you.
> >>
> >> My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan
> Seefeld.
> >>
> >> Thanks !
> >
> >
> > Just to clarify, this is not for pydotorg but the svn.python.org.  The
> > admins for our future Roundup instance are going to keep their Roundup
> code
> > in svn so they need commit access.
>
> Now when that's clarified, here's my data:
>
> Public SSH key: http://efod.se/about/ptkey.pub
> First.Lastname: erik.forsberg
>
> I'd appreciate if someone with good taste could tell us where in the
> tree we should add our code :-).


Right at the root: ``svn+ssh://pythondev at svn.python.org/tracker`` (or
replace "tracker" without whatever name you guys want to go with).  This is
because the tracker code is conceptually its own project.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/95b687e6/attachment.htm 

From oliphant.travis at ieee.org  Wed Nov  1 20:38:01 2006
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Wed, 01 Nov 2006 12:38:01 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
Message-ID: <eiasv1$s32$1@sea.gmane.org>

Jim Jewett wrote:
> I'm still not sure exactly what is missing from ctypes.  To make this concrete:

I was too hasty.  There are some things actually missing from ctypes:

1) long double (this is not the same across platforms, but it is a 
data-type).
2) complex-valued types (you might argue that it's just a 2-array of 
floats, but you could say the same thing about int as an array of 
bytes).  The point is how do people interpret the data.  Complex-valued 
data-types are very common.  It is one reason Fortran is still used by 
scientists.
3) Unicode characters (there is w_char support but I mean a way to 
describe what kind of unicode characters you have in a cross-platform 
way).  I actually think we have a way to describe encodings in the 
data-format representation as well.

4) What about floating-point representations that are not IEEE 754 
4-byte or 8-byte.   There should be a way to at least express the 
data-format in these cases (this is actually how long double should be 
handled as well since it varies across platforms what is actually done 
with the extra bits).

So, we can't "just use ctypes" as a complete data-format representation 
because it's also missing some things.

What we need is a standard way for libraries that deal with data-formats 
to communicate with each other.  I need help with a PEP like this and 
that's what I'm asking for.  It's all I've really been after all along.

A couple of points:

* One reason to support the idea of the Python object approach (versus a 
string-syntax) is that it "is already parsed".  A list-syntax approach 
(perhaps built from strings for fundamental data-types) might also be 
considered "already parsed" as well.

* One advantage of using "kind" versus a character for every type (like 
struct and array do) is that it helps consumers and producers speed up 
the parser (a fuller branching tree).


-Travis


From martin at v.loewis.de  Wed Nov  1 20:49:44 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Nov 2006 20:49:44 +0100
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <eiaovn$dar$1@sea.gmane.org>
References: <eiaipj$l5s$1@sea.gmane.org> <4548DDFD.5030604@v.loewis.de>
	<eiaovn$dar$1@sea.gmane.org>
Message-ID: <4548FA58.4050702@v.loewis.de>

Travis E. Oliphant schrieb:
>> Or, if it does have uses independent of the buffer extension: what
>> are those uses?
> 
> So that NumPy and ctypes and audio libraries and video libraries and 
> database libraries and image-file format libraries can communicate about 
> data-formats using the same expressions (in Python).

I find that puzzling. In what way can the specification of a data type
enable communication? Don't you need some kind of protocol for it
(i.e. operations to be invoked)? Also, do you mean that these libraries
can communicate with each other? Or with somebody else? If so, with
whom?

> What problem do you have in defining a standard way to communicate about 
> binary data-formats (not just images)?  I still can't figure out why you 
> are so resistant to the idea.  MPI had to do it.

I'm afraid of "dead" specifications, things whose only motivation is
that they look nice. They are just clutter. There are a few examples
of this already in Python, like the character buffer interface or
the multi-segment buffers.

As for MPI: It didn't just independently define a data types system.
Instead, it did that, *and* specified the usage of the data types
in operations such as MPI_SEND. It is very clear what the scope of
this data description is, and what the intended usage is.

Without specifying an intended usage, it is impossible to evaluate
whether the specification meets its goals.

> Absolutely --- if something is to be made useful across packages and 
> from Python.   This is where the discussion should take place.  The 
> struct module and array modules would both be consumers also so that in 
> the struct module you could specify your structure in terms of the 
> standard data-represenation and in the array module you could specify 
> your array in terms of the standard representation instead of using 
> "character codes".

Ok, that would be a new usage: I expected that datatype instances
always come in pairs with memory allocated and filled according to
the description. If you are proposing to modify/extend the API
of the struct and array modules, you should say so somewhere (in
a PEP).

>> Suppose I wanted to change all RGB values to a gray value (i.e. R=G=B),
>> what would the C code look like that does that? (it seems now that the
>> primary purpose of this machinery is image manipulation)
>>
> 
> For me it is definitely not image manipulation that is the only purpose 
> (or even the primary purpose).  It's just an easy one to explain --- 
> most people understand images).   But, I think this question is actually 
> irrelevant (IMHO).  To me, how you change all RGB values to gray would 
> depend on the library you are using not on how data-formats are expressed.
> 
> Maybe we are still mis-understanding each other.

I expect that the primary readers/users of the PEP would be people who
have to write libraries: i.e. people implementing NumPy, struct, array,
and people who implement algorithms that operate on data. So usability
of the specification is a matter of how easy it is to *write* a library
that does perform the image manipulation.

> If you really want to know.  In NumPy it might look like this:
> 
> Python code:
> 
> img['r'] = img['g']
> img['b'] = img['g']

That's not what I'm asking. Instead, what does the NumPy code look
like that gets invoked on these read-and-write operations? Does it
only use the void* pointing to the start of the data, and the
datatype object? If not, how would C code look like that only has
the void* and the datatype object?

> dtype = img->descr;

In this code, is descr a datatype object? ...

> r_field = PyDict_GetItemString(dtype,'r');

... I guess not, because apparently, it is a dictionary, not
a datatype object.

> But, I still don't see how that is relevant to the question of how to 
> represent the data-format to share that information across two extensions.

Well, if NumPy gets the data from a different module, it can't assume
there is a descr object that is a dictionary. Instead, it must
perform these operations just by using the datatype object. What
else is the purpose of sharing the information, if not to use it
to access the data?

Regards,
Martin


From martin at v.loewis.de  Wed Nov  1 21:05:28 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Nov 2006 21:05:28 +0100
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <eiasv1$s32$1@sea.gmane.org>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
	<eiasv1$s32$1@sea.gmane.org>
Message-ID: <4548FE08.7070402@v.loewis.de>

Travis E. Oliphant schrieb:
> I was too hasty.  There are some things actually missing from ctypes:

I think Thomas can correct me if I'm wrong: I think endianness is
supported (although this support seems undocumented). There seems
to be code that checks for the presence of a _byteswapped_ attribute
on fields of a struct; presence of this field is then interpreted
as data having the "other" endianness.

> 1) long double (this is not the same across platforms, but it is a 
> data-type).

That's indeed missing.

> 2) complex-valued types (you might argue that it's just a 2-array of 
> floats, but you could say the same thing about int as an array of 
> bytes).  The point is how do people interpret the data.  Complex-valued 
> data-types are very common.  It is one reason Fortran is still used by 
> scientists.

Well, by the same reasoning, you could argue that pixel values (RGBA)
are missing in the PEP. It's a convenience, sure, and it may also help
interfacing with the platform's FORTRAN implementation - however, are
you sure that NumPy's complex layout is consistent with the platform's
C99 _Complex definition?

> 3) Unicode characters
> 
> 4) What about floating-point representations that are not IEEE 754 
> 4-byte or 8-byte.

Both of these are available in a platform-dependent way: if the
platform uses non-IEEE754 formats for C float and C double, ctypes
will interface with that just fine. It is actually vice versa:
IEEE-754 4-byte and 8-byte is not supported in ctypes.
Same for Unicode: the platform's wchar_t is supported (as you said),
but not a platform-independent (say) 4-byte little-endian.

Regards,
Martin

From sluggoster at gmail.com  Wed Nov  1 21:14:53 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Wed, 1 Nov 2006 12:14:53 -0800
Subject: [Python-Dev] Path object design
In-Reply-To: <6e9196d20611011011m3b04225ao3b51b015accfa0a7@mail.gmail.com>
References: <6e9196d20611011011m3b04225ao3b51b015accfa0a7@mail.gmail.com>
Message-ID: <6e9196d20611011214g5bf63839j24ee976a0a0d4c67@mail.gmail.com>

Argh, it's difficult to respond to one topic that's now spiraling into
two conversations on two lists.

glyph at divmod.com wrote:
> On 03:14 am, sluggoster at gmail.com wrote:
>
> >One thing is sure -- we urgently need something better than os.path.
> >It functions well but it makes hard-to-read and unpythonic code.
>
> I'm not so sure.  The need is not any more "urgent" today than it was
> 5 years ago, when os.path was equally "unpythonic" and unreadable.
> The problem is real but there is absolutely no reason to hurry to a
> premature solution.

Except that people have had to spend five years putting hard-to-read
os.path functions in the code, or reinventing the wheel with their own
libraries that they're not sure they can trust.  I started to use
path.py last year when it looked like it was emerging as the basis of
a new standard, but yanked it out again when it was clear the API
would be different by the time it's accepted.  I've gone back to
os.path for now until something stable emerges but I really wish I
didn't have to.

> I've already recommended Twisted's twisted.python.filepath module as a
> possible basis for the implementation of this feature....

> *It is already used in a large body of real, working code, and
> therefore its limitations are known.*

This is an important consideration.However, to me a clean API is more
important.  Since we haven't agreed on an API there is no widely-used
module that implements it... it's a chicken-and-egg problem since it
takes significant time to write and test an implementation.  So I'd
like to start from the standpoint of an ideal API rather than just
taking the API of the most widely-used implementation.  os.path is
clearly the most widely-used implementation, but that doesn't mean
that OOizing it as-is would be my favorite choice.

I took a quick look at filepath.  It looks similar in concept to PEP
355.  Four concerns:
    - unfamiliar method names (createDirectory vs mkdir, child vs join)
    - basename/dirname/parent are methods rather than properties:
leads to () overproliferation in user code.
    - the "secure" features may not be necessary.  If they are, this
should be a separate discussion, and perhaps implemented as a
subclass.
    - stylistic objection to verbose camelCase names like createDirectory


> Proposals for extending the language are contentious and it is very
> difficult to do experimentation with non-trivial projects because
> nobody wants to do that and then end up with a bunch of code written
> in a language that is no longer supported when the experiment fails.

True.

> Path representation is a bike shed.  Nobody would have proposed
> writing an entirely new embedded database engine for Python: python
> 2.5 simply included SQLite because its utility was already proven.

There's a quantum level of difference between path/file manipulation
-- which has long been considered a requirement for any full-featured
programming language -- and a database engine which is much more
complex.

Georg Brandl <g.brandl at gmx.net> wrote:
> I have been a supporter of the full-blown Path object in the past, but the
> recent discussions have convinved me that it is just too big and too confusing,
> and that you can't kill too many birds with one stone in this respect.
> Most of the ugliness really lies in the path name manipulation functions, which
> nicely map to methods on a path name object.

Fredrik has convinced me that it's more urgent to OOize the pathname
conversions than the filesystem operations.  Pathname conversions are
the ones that frequently get nested or chained, whereas filesystem
operations are usually done at the top level of a program statement,
or return a different "kind" of value (stat, true/false, etc).

However, it's interesting that all the proposals I've seen in the past
three years have been a "monolithic" OO class.  Clearly there are a
lot of people who prefer this way, or at least have never heard of
anything different.  Where have all the proponents of non-OO or
limited-OO strategies been?  The first proposal of that sort I've seen
was Nich Cochlan's October 1.  Have y'all just been ignoring the
monolithic OO efforts without offering any alternatives?


Fredrik Lundh <fredrik at pythonware.com> wrote:
> > This is fully backwards compatible, can go right into 2.6 without
> > breaking anything, allows people to update their code as they go,
> > and can be incrementally improved in future releases:
> >
> >      1) Add a pathname wrapper to "os.path", which lets you do basic
> >         path "algebra".  This should probably be a subclass of unicode,
> >         and should *only* contain operations on names.
> >
> >      2) Make selected "shutil" operations available via the "os" name-
> >         space; the old POSIX API vs. POSIX SHELL distinction is pretty
> >         irrelevant.  Also make the os.path predicates available via the
> >         "os" namespace.
> >
> > This gives a very simple conceptual model for the user; to manipulate
> > path *names*, use "os.path.<op>(string)" functions or the "<path>"
> > wrapper.  To manipulate *objects* identified by a path, given either as
> > a string or a path wrapper, use "os.<op>(path)".  This can be taught in
> > less than a minute.


Making this more concrete, I think Fredrik is suggesting:
    - Make (os.path) abspath, basename, commonprefix, dirname,
expanduser, expandvars, isabs, join, normcase, normpath, split,
splitdrive, splitext, splitunc methods of a Path object.
    - Copy functions into os: (os.path) exists, lexists,
get{atime,mtime,ctime,size}, is{file,dir,link,mount}, realpath,
samefile, sameopenfile, samestat, (shutil) copy, copy2,
copy{file,fileobj,mode,stat,tree}, rmtree, move.
    - Deprecate the old functions to remove in 3.0.
    - Abandon os.path.walk because os.walk is better.

This is worth considering as a start.  It does mean moving a lot of
functions that may be moved again at some point in the future.

If we do move shutil functions into os, I'd at least like to make some
tiny improvements in them.  Adding four lines to the beginning of
rmtree would make it behave like my purge() function without
detracting from its existing use:

    if not os.exists(p):
        return
    if not os.isdir(p):
        p.remove()

Also, do we really need six copy methods?  copy2 can be handled by a
third argument, etc.

-- 
Mike Orr <sluggoster at gmail.com>

From oliphant.travis at ieee.org  Wed Nov  1 21:18:23 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 13:18:23 -0700
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <4548FA58.4050702@v.loewis.de>
References: <eiaipj$l5s$1@sea.gmane.org> <4548DDFD.5030604@v.loewis.de>
	<eiaovn$dar$1@sea.gmane.org> <4548FA58.4050702@v.loewis.de>
Message-ID: <4549010F.6090200@ieee.org>

Martin v. L?wis wrote:
> Travis E. Oliphant schrieb:
>   
>>> Or, if it does have uses independent of the buffer extension: what
>>> are those uses?
>>>       
>> So that NumPy and ctypes and audio libraries and video libraries and 
>> database libraries and image-file format libraries can communicate about 
>> data-formats using the same expressions (in Python).
>>     
>
> I find that puzzling. In what way can the specification of a data type
> enable communication? Don't you need some kind of protocol for it
> (i.e. operations to be invoked)? Also, do you mean that these libraries
> can communicate with each other? Or with somebody else? If so, with
> whom?
>   
What is puzzling?  I've just specified the extended buffer protocol as 
something concrete that data-format objects are shared through.   That's 
on the C-level.  I gave several examples of where such sharing would be 
useful.

Then, I gave examples in Python of how sharing data-formats would also 
be useful so that modules could support the same means to construct 
data-formats (instead of struct using strings, array using typecodes, 
ctypes using it's type-objects, and NumPy using dtype objects).
>   
>> What problem do you have in defining a standard way to communicate about 
>> binary data-formats (not just images)?  I still can't figure out why you 
>> are so resistant to the idea.  MPI had to do it.
>>     
>
> I'm afraid of "dead" specifications, things whose only motivation is
> that they look nice. They are just clutter. There are a few examples
> of this already in Python, like the character buffer interface or
> the multi-segment buffers.
>   
O.K.  I can understand that concern.    But, all you do is make struct, 
array, and ctypes support the same data-format specification (by support 
I mean have a way to "consume" and "produce" the data-format object to 
the natural represenation that they have internally) and you are 
guaranteed it won't "die."   In fact, what would be ideal is for the 
PIL, NumPy, CVXOpt, PyMedia, PyGame, pyre, pympi, PyVoxel, etc., etc. 
(there really are many modules that should be able to talk to each other 
more easily) to all support the same data-format representations. Then, 
you don't have to learn everybody's  re-invention of the same concept 
whenever you encounter a new library that does something with binary data.

How much time do you actually spend with binary data (sound, video, 
images, just plain numbers from a scientific experiment) and trying to 
use multiple Python modules to manipulate it?  If you don't spend much 
time, then I can understand why you don't understand the need.
> As for MPI: It didn't just independently define a data types system.
> Instead, it did that, *and* specified the usage of the data types
> in operations such as MPI_SEND. It is very clear what the scope of
> this data description is, and what the intended usage is.
>
> Without specifying an intended usage, it is impossible to evaluate
> whether the specification meets its goals.
>   
What is not understood about the intended usage in the extended buffer 
protocol.  What is not understood about the intended usage of giving the 
array and struct modules a uniform way to represent binary data?
> Ok, that would be a new usage: I expected that datatype instances
> always come in pairs with memory allocated and filled according to
> the description. 
To me that is the most important usage, but it's not the *only* one. 

> If you are proposing to modify/extend the API
> of the struct and array modules, you should say so somewhere (in
> a PEP).
>   
Sure, I understand that.  But, if there is no data-format object, then 
there is no PEP to "extend the struct and array modules" to support it.  
Chicken before the egg, and all that.
> I expect that the primary readers/users of the PEP would be people who
> have to write libraries: i.e. people implementing NumPy, struct, array,
> and people who implement algorithms that operate on data.

Yes, but not only them.  If it's a default way to represent data,  then 
*users* of those libraries that "consume" the representation would also 
benefit by learning a standard.

>  So usability
> of the specification is a matter of how easy it is to *write* a library
> that does perform the image manipulation.
>
>   
>> If you really want to know.  In NumPy it might look like this:
>>
>> Python code:
>>
>> img['r'] = img['g']
>> img['b'] = img['g']
>>     
>
> That's not what I'm asking. Instead, what does the NumPy code look
> like that gets invoked on these read-and-write operations? Does it
> only use the void* pointing to the start of the data, and the
> datatype object? If not, how would C code look like that only has
> the void* and the datatype object?
>
>   
>> dtype = img->descr;
>>     
>
> In this code, is descr a datatype object? ...
>   
Yes.  But, I have a mistake later...
>   
>> r_field = PyDict_GetItemString(dtype,'r');
>>     
Actually it should read PyDict_GetItemString(dtype->fields).    The 
r_field is a tuple (data-type object, offset).  The fields attribute is 
(currently) a Python dictionary.

>
> ... I guess not, because apparently, it is a dictionary, not
>   
> a datatype object.
>   
Sorry for the confusion. 

>   
>> But, I still don't see how that is relevant to the question of how to 
>> represent the data-format to share that information across two extensions.
>>     
>
> Well, if NumPy gets the data from a different module, it can't assume
> there is a descr object that is a dictionary. Instead, it must
> perform these operations just by using the datatype object.
Right.  I see.  Again, I made a mistake in the code.

img->descr   is a data-type object in NumPy.

img->descr->fields   is a dictionary of fields keyed by 'name' and 
returning a tuple (data-type object, offset)

But, the other option (especially for code already written) would be to 
just convert the data-format specification into it's own internal 
representation.  This is the case that I was thinking about when I said 
it didn't matter how the library operated on the data. 

If new code wanted to use the data-format object as *the* internal 
representation, then it would matter. 
>  What
> else is the purpose of sharing the information, if not to use it
> to access the data?
>   
Of course.  I'm sorry my example was incorrect.  I guess this falls 
under the category of "ease of use".

If the data-type format can *be* the internal representation, then ease 
of use is *optimal* because no translation is required.  In my ideal 
world that's the way it would be.  But, even if we can't get there 
immediately, we can at least define a standard for communication.


From alexander.belopolsky at gmail.com  Wed Nov  1 21:52:43 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 1 Nov 2006 20:52:43 +0000 (UTC)
Subject: [Python-Dev] idea for data-type (data-format) PEP
References: <eiaipj$l5s$1@sea.gmane.org>
Message-ID: <loom.20061101T204731-521@post.gmane.org>

Travis E. Oliphant <oliphant.travis <at> ieee.org> writes:


> What if we look at this from the angle of trying to communicate 
> data-formats between different libraries (not change the way anybody 
> internally deals with data-formats).
> 
> For example, ctypes has one way to internally deal with data-formats 
> (using type objects).
> 
> NumPy/Numeric has a way to internally deal with data-formats (using 
> PyArray_Descr * structure -- in Numeric it's just a C-structure but in 
> NumPy it's fleshed out further and also a Python object called the 
> data-type).
> 

Ctypes and NumPy's Array Interface address two different needs.
When using ctypes, producers of type information
are at the Python level, but Array Interface information is
produced in C code. It is very convenient to write c_int*2*3 to
specify a 2x3 integer matrix in Python, but it is much easier to
set type code to 'i' and populate the shape array with integers
in C.

Consumers of type information are at the C level in both ctypes
and Array Interface applications, but in the case of ctypes, users
are not expected to write C code. It is typical for an array
interface consumer to switch on the type code.  Single character
(or numeric) type codes are much more convenient than verbose type
names in this case.

I have used Array Interface extensively, but only for simple types
and I have studied ctypes from Python level, but not from C level.

I think the standard data type description object should build on
the strengths of both approaches.

I believe the first step should be to agree on a representation of
simple types.  Just an agreement on the standard type codes that
every module could use would be a great improvement. (Personally,
I don't need anything else from array interface.)

I don't like letter codes, however. I would prefer to use an enum
at the C level and verbose names at Python level.

I would also like to mention one more difference between NumPy datatypes
and ctypes that I did not see discussed.  In ctypes arrays of different
shapes are represented using different types.  As a result, if the object
exporting its buffer is resized, the datatype object cannot be reused, it
has to be replaced.


From martin at v.loewis.de  Wed Nov  1 21:54:33 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Nov 2006 21:54:33 +0100
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <4549010F.6090200@ieee.org>
References: <eiaipj$l5s$1@sea.gmane.org> <4548DDFD.5030604@v.loewis.de>
	<eiaovn$dar$1@sea.gmane.org> <4548FA58.4050702@v.loewis.de>
	<4549010F.6090200@ieee.org>
Message-ID: <45490989.9010603@v.loewis.de>

Travis Oliphant schrieb:
>>> r_field = PyDict_GetItemString(dtype,'r');
>>>     
> Actually it should read PyDict_GetItemString(dtype->fields).    The
> r_field is a tuple (data-type object, offset).  The fields attribute is
> (currently) a Python dictionary.

Ok. This seems to be missing in the PEP. The section titled "Attributes"
seems to talk about Python-level attributes. Apparently, you are
suggesting that there is also a C-level API, lower than
PyObject_GetAttrString, so that you can write dtype->fields, instead
of having to write PyObject_GetAttrString(dtype, "fields").

If it is indeed the intend that this kind of acccess is available
for datatype objects, then the PEP should specify it. Notice that
it would be uncommon for a type in Python: Most types have getter
functions (such as PyComplex_RealAsDouble, rather then specifying
direct access through obj->cval.real).

Going now back to your original code (and assuming proper adjustments):

dtype = img->descr;
r_field = PyDict_GetItemString(dtype,'r');
g_field = PyDict_GetItemString(dtype,'g');
r_field_dtype = PyTuple_GET_ITEM(r_field, 0);
r_field_offset = PyTuple_GET_ITEM(r_field, 1);
g_field_dtype = PyTuple_GET_ITEM(g_field, 0);
g_field_offset = PyTuple_GET_ITEM(g_field, 1);
obj = PyArray_GetField(img, g_field, g_field_offset);
Py_INCREF(r_field)
PyArray_SetField(img, r_field, r_field_offset, obj);

In this code, where is PyArray_GetField coming from? What does
it do? If I wanted to write this code from scratch, what
should I write instead? Since this is all about a flat
memory block, I'm surprised I need "true" Python objects
for the field values in there.

> But, the other option (especially for code already written) would be to
> just convert the data-format specification into it's own internal
> representation.

Ok, so your assumption is that consumers already have their own
machinery, in which case ease-of-use would be the question how
difficult it is to convert datatype objects into the internal
representation.

Regards,
Martin

From alexander.belopolsky at gmail.com  Wed Nov  1 22:05:17 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 1 Nov 2006 21:05:17 +0000 (UTC)
Subject: [Python-Dev] idea for data-type (data-format) PEP
References: <eiaipj$l5s$1@sea.gmane.org> <4548DDFD.5030604@v.loewis.de>
	<eiaovn$dar$1@sea.gmane.org> <4548FA58.4050702@v.loewis.de>
Message-ID: <loom.20061101T215739-136@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:

> 
> I'm afraid of "dead" specifications, things whose only motivation is
> that they look nice. They are just clutter. There are a few examples
> of this already in Python, like the character buffer interface or
> the multi-segment buffers.
> 

Multi-segment buffers are only dead because standard library modules
do not support them.  I often work with text data that is represented
as an array of strings.  I would love to implement a multi-segment
buffer interface on top of that data and be able to do a full text
regular expression search without having to concatenate into one big
string, but python's re module would not take a multi-segment buffer.


From anthony at python.org  Wed Nov  1 11:50:32 2006
From: anthony at python.org (Anthony Baxter)
Date: Wed, 1 Nov 2006 21:50:32 +1100
Subject: [Python-Dev] RELEASED Python 2.3.6, FINAL
Message-ID: <200611012150.44644.anthony@python.org>

On behalf of the Python development team and the Python
community, I'm happy to announce the release of Python 2.3.6
(FINAL).

Python 2.3.6 is a security bug-fix release. While Python 2.5
is the latest version of Python, we're making this release for
people who are still running Python 2.3. Unlike the recently
released 2.4.4, this release only contains a small handful of
security-related bugfixes. See the website for more.

*  Python 2.3.6 contains a fix for PSF-2006-001, a buffer overrun
*  in repr() of unicode strings in wide unicode (UCS-4) builds.
*  See http://www.python.org/news/security/PSF-2006-001/ for more.

This is a **source only** release. The Windows and Mac binaries
of 2.3.5 were built with UCS-2 unicode, and are therefore not
vulnerable to the problem outlined in PSF-2006-001. The PCRE fix
is for a long-deprecated module (you should use the 're' module
instead) and the email fix can be obtained by downloading the
standalone version of the email package.

Most vendors who ship Python should have already released a
patched version of 2.3.5 with the above fixes, this release is
for people who need or want to build their own release, but don't
want to mess around with patch or svn.

There have been no changes (apart from the version number) since the
release candidate of 2.3.6.

Python 2.3.6 will complete python.org's response to PSF-2006-001.
If you're still on Python 2.2 for some reason and need to work
with UCS-4 unicode strings, please obtain the patch from the
PSF-2006-001 security advisory page. Python 2.4.4 and Python 2.5
have both already been released and contain the fix for this
security problem.

For more information on Python 2.3.6, including download links
for source archives, release notes, and known issues, please see:

    http://www.python.org/2.3.6

Highlights of this new release include:

  - A fix for PSF-2006-001, a bug in repr() for unicode strings 
    on UCS-4 (wide unicode) builds.
  - Two other, less critical, security fixes.

Enjoy this release,
Anthony

Anthony Baxter
anthony at python.org
Python Release Manager
(on behalf of the entire python-dev team)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061101/cb5f29ef/attachment-0002.pgp 
-------------- next part --------------
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

        Support the Python Software Foundation:
        http://www.python.org/psf/donations.html

From martin at v.loewis.de  Wed Nov  1 22:13:29 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 01 Nov 2006 22:13:29 +0100
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <loom.20061101T204731-521@post.gmane.org>
References: <eiaipj$l5s$1@sea.gmane.org>
	<loom.20061101T204731-521@post.gmane.org>
Message-ID: <45490DF9.4070500@v.loewis.de>

Alexander Belopolsky schrieb:
> I would also like to mention one more difference between NumPy datatypes
> and ctypes that I did not see discussed.  In ctypes arrays of different
> shapes are represented using different types.  As a result, if the object
> exporting its buffer is resized, the datatype object cannot be reused, it
> has to be replaced.

That's also an interesting issue for the datatypes PEP: are datatype
objects meant to be immutable?

This is particularly interesting for the extended buffer protocol:
how long can one keep the data you get from bt_getarrayinfo?

Also, how does the memory management work for the results?

Regards,
Martin

From Chris.Barker at noaa.gov  Wed Nov  1 20:20:47 2006
From: Chris.Barker at noaa.gov (Chris Barker)
Date: Wed, 1 Nov 2006 19:20:47 +0000 (UTC)
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
References: <ei6pde$7mn$1@sea.gmane.org> <4547BF86.6070806@v.loewis.de>
Message-ID: <loom.20061101T195940-807@post.gmane.org>

Martin v. L?wis <martin <at> v.loewis.de> writes:

> Can you please give examples for real-world applications of this
> interface, preferably examples involving multiple
> independently-developed libraries?

OK -- here's one I haven't seen in this thread yet:

wxPython has a lot code to translate between various Python data types and wx 
data types. An example is PointList Helper. This code examines the input 
Python data, and translates it to a wxList of wxPoints. It is used in a bunch 
of the drawing functions, for instance. It has some nifty optimizations so 
that if a python list if (x,y) tuples is passed in, then the code uses 
PyList_GetItem() to access the tuples, for instance.

If an Nx2 numpy array is passed in, it defaults to PySequence_GetItem() to get 
the (x,y) pair, and then again to get the values, which are converted to 
Python numbers, then checked and converted again to C ints.

The results is an awful lot of processing, even though the data in the numpy 
array already exists in a C array that could be exactly the same as the wxList 
of wxPoints (in fact, many of the drawing methods take a pointer to a 
correctly formatted C array of data).

Right now, it is faster to convert your numpy array of points to a python list 
of tuples first, then pass it in to wx.

However, were there a standard way to describe a buffer (pointer to a C array 
of data), then the PointListHelper code could look to see if the data is 
already correctly formated, and pass the pointer right through. If it was not 
it could still do the translation (like from doubles to ints, for instance) 
far more efficiently.

When I get the chance, I do intend to contribute code to support this in 
wxPython, using the numpy array interface. However, wouldn't it be better for 
it to support a generic interface that was in the standard lib, rather than 
only numpy?

While /F suggested we get off the PIL bandwagon, I do have code that has to 
pass data around between numpy, PIL and wx.Images ( and matplotlib AGG 
buffers, and GDAL geo-referenced image buffers, and ...). Most do support the 
current buffer protocol, so it can be done, but I'd be much happier if there 
was a little more checking going on, rather than my python code having to make 
sure the data is all arranged in memory the right way.

Oh, there is also the Python Cartographic Library, which can take a Python 
list of tuples as coordinates, and to a Projection on them, but which can't 
take a numpy array holding that same data.

-Chris


From fredrik at pythonware.com  Wed Nov  1 22:46:29 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Wed, 01 Nov 2006 22:46:29 +0100
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
In-Reply-To: <loom.20061101T195940-807@post.gmane.org>
References: <ei6pde$7mn$1@sea.gmane.org> <4547BF86.6070806@v.loewis.de>
	<loom.20061101T195940-807@post.gmane.org>
Message-ID: <eib4jk$og2$1@sea.gmane.org>

Chris Barker wrote:

> While /F suggested we get off the PIL bandwagon

I suggest we drop the obsession with pointers to memory areas that are 
supposed to have a specific format; modern data access API:s don't work 
that way for good reasons, so I don't see why Python should grow a 
standard based on that kind of model.

the "right solution" for things like this is an *API* that lets you do 
things like:

     view = object.acquire_view(region, supported formats)
     ... access data in view ...
     view.release()

and, for advanced users

     format = object.query_format(constraints)

</F>


From glyph at divmod.com  Wed Nov  1 22:57:24 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 01 Nov 2006 21:57:24 -0000
Subject: [Python-Dev] Path object design
Message-ID: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>


On 06:14 pm, fredrik at pythonware.com wrote:
>glyph at divmod.com wrote:
>
>> I assert that it needs a better[1] interface because the current
>> interface can lead to a variety of bugs through idiomatic, apparently
>> correct usage.  All the more because many of those bugs are related to
>> critical errors such as security and data integrity.

>instead of referring to some esoteric knowledge about file systems that
>us non-twisted-using mere mortals may not be evolved enough to under-
>stand,

On the contrary, twisted users understand even less, because (A) we've been demonstrated to get it wrong on numerous occasions in highly public and embarrassing ways and (B) we already have this class that does it all for us and we can't remember how it works :-).

>maybe you could just make a list of common bugs that may arise
>due to idiomatic use of the existing primitives?

Here are some common gotchas that I can think of off the top of my head.  Not all of these are resolved by Twisted's path class:

Path manipulation:

 * This is confusing as heck:
   >>> os.path.join("hello", "/world")
   '/world'
   >>> os.path.join("hello", "slash/world")
   'hello/slash/world'
   >>> os.path.join("hello", "slash//world")
   'hello/slash//world'
   Trying to formulate a general rule for what the arguments to os.path.join are supposed to be is really hard.  I can't really figure out what it would be like on a non-POSIX/non-win32 platform.

 * it seems like slashes should be more aggressively converted to backslashes on windows, because it's near impossible to do anything with os.sep in the current situation.

 * "C:blah" does not mean what you think it means on Windows.  Regardless of what you think it means, it is not that.  I thought I understood it once as the current process having a current directory on every mapped drive, but then I had to learn about UNC paths of network mapped drives and it stopped making sense again.

 * There are special files on windows such as "CON" and "NUL" which exist in _every_ directory.  Twisted does get around this, by looking at the result of abspath:
   >>> os.path.abspath("c:/foo/bar/nul")
   '\\\\nul'

 * Sometimes a path isn't a path; the zip "paths" in sys.path are a good example.  This is why I'm a big fan of including a polymorphic interface of some kind: this information is *already* being persisted in an ad-hoc and broken way now, so it needs to be represented; it would be good if it were actually represented properly.  URL manipulation-as-path-manipulation is another; the recent perforce use-case mentioned here is a special case of that, I think.

 * paths can have spaces in them and there's no convenient, correct way to quote them if you want to pass them to some gross function like os.system - and a lot of the code that manipulates paths is shell-script-replacement crud which wants to call gross functions like os.system.  Maybe this isn't really the path manipulation code's fault, but it's where people start looking when they want properly quoted path arguments.

 * you have to care about unicode sometimes.  rarely enough that none of your tests will ever account for it, but often enough that _some_ users will notice breakage if your code is ever widely distributed.  this is an even more obscure example, but pygtk always reports pathnames in utf8-encoded *byte* strings, regardless of your filesystem encoding.  If you forget to decode/encode it, hilarity ensues.  There's no consistent error reporting (as far as I can tell, I have encountered this rarely) and no real way to detect this until you have an actual insanely-configured system with an insanely-named file on it to test with.  (Polymorphic interfaces might help a *bit* here.  At worst, they would at least make it possible to develop a canonical "insanely encoded filesystem" test-case backend.  At best, you'd absolutely have to work in terms of unicode all the time, and no implicit encoding issues would leak through to application code.)  Twisted's thing doesn't deal with this at all, and it really should.

 * also *sort* of an encoding issue, although basically only for webservers or other network-accessible paths: thanks to some of these earlier issues as well as %2e%2e, there are effectively multiple ways to spell "..".  Checking for all of them is impossible, you need to use the os.path APIs to determine if the paths you've got really relate in the ways you think they do.

 * os.pathsep can be, and actually sometimes is, embedded in a path.  (again, more  of a general path problem, not really python's fault)

 * relative path manipulation is difficult.  ever tried to write the function to iterate two separate trees of files in parallel?  shutil re-implements this twice completely differently via recursion, and it's harder to do with a generator (which is what you really want).  you can't really split on os.sep and have it be correct due to the aforementioned windows-path issue, but that's what everybody does anyway.

 * os.path.split doesn't work anything like str.split.

FS manipulation:

 * although individual operations are atomic, shutil.copytree and friends aren't.  I've often seen python programs confused by partially-copied trees of files.  This isn't even really an atomicity issue; it's often due to a traceback in the middle of a running python program which leaves the tree half-broken.

 * the documentation really can't emphasize enough how bad using 'os.path.exists/isfile/isdir', and then assuming the file continues to exist when it is a contended resource, is.  It can be handy, but it is _always_ a race condition.

>I promise to make a nice FAQ entry out of it, with proper attribution.

Thanks.  The list here is just a brain dump, I'm not sure it's all appropriate for a FAQ, but I hope some of it is useful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/b6052fc2/attachment.html 

From alexander.belopolsky at gmail.com  Wed Nov  1 22:58:42 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 1 Nov 2006 16:58:42 -0500
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <45490DF9.4070500@v.loewis.de>
References: <eiaipj$l5s$1@sea.gmane.org>
	<loom.20061101T204731-521@post.gmane.org>
	<45490DF9.4070500@v.loewis.de>
Message-ID: <d38f5330611011358x26552268n7cead0c16ef97058@mail.gmail.com>

On 11/1/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:

> That's also an interesting issue for the datatypes PEP: are datatype
> objects meant to be immutable?
>
That's a question for Travis, but I would think that they would be
immutable at the Python level, but mutable at the C level.  In Travis'
approach array size  is not stored in the datatype, so I don't see
much need to modify datatype objects in-place. It may be reasonable to
allow adding fields to a record, but I don't have enough experience
with that to comment.


> This is particularly interesting for the extended buffer protocol:
> how long can one keep the data you get from bt_getarrayinfo?
>
I think your question is limited to shape and strides outputs because
dataformat is a reference counted PyObject (and PEP should specify
whether it is a borrowed reference).

And the answer is the same as for the data from
bf_getreadbuffer/bf_getwritebuffer .  AFAIK, existing buffer protocol
does not answer this question delegating it to the extension module
writers who provide objects exporting their buffers.


> Also, how does the memory management work for the results?

I think it is implied that all pointers are borrowed references.  I
could not find any discussion of memory management in the current
buffer protocol documentation.

This is a good question.  It may be the case that the shape or stride
information is not available as Py_intptr_t array inside the object
that wants to export its memory buffer.  This is not theoretical, I
have a 64-bit application that uses objects that keep their size
information in a 32-bit int.

BTW, I think the memory management issues with the buffer objects have
been resolved at some point.  Any lessons to learn from that?

From oliphant.travis at ieee.org  Wed Nov  1 23:22:52 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 15:22:52 -0700
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
In-Reply-To: <eib4jk$og2$1@sea.gmane.org>
References: <ei6pde$7mn$1@sea.gmane.org>
	<4547BF86.6070806@v.loewis.de>	<loom.20061101T195940-807@post.gmane.org>
	<eib4jk$og2$1@sea.gmane.org>
Message-ID: <eib6nt$vis$1@sea.gmane.org>

Fredrik Lundh wrote:
> Chris Barker wrote:
> 
> 
>>While /F suggested we get off the PIL bandwagon
> 
> 
> I suggest we drop the obsession with pointers to memory areas that are 
> supposed to have a specific format; modern data access API:s don't work 
> that way for good reasons, so I don't see why Python should grow a 
> standard based on that kind of model.
> 

Please give us an example of a modern data-access API (i.e. an 
application that uses one)?

I presume you are not fundamentally opposed to sharing memory given the 
example you gave.

> the "right solution" for things like this is an *API* that lets you do 
> things like:
> 
>      view = object.acquire_view(region, supported formats)
>      ... access data in view ...
>      view.release()
> 
> and, for advanced users
> 
>      format = object.query_format(constraints)
> 

It sounds like you are concerned about the memory-area-not-current 
problem.  Yeah, it can be a problem (but not an unsolvable one). 
Objects that share memory through the buffer protcol just have to be 
careful about resizing themselves or eliminating memory.

Anyway, it's a problem not solved by the buffer protocol.  I have no 
problem with trying to fix that in the buffer protocol, either.

It's all completely separate from what I'm talking about as far as I can 
tell.

-Travis


From glyph at divmod.com  Wed Nov  1 23:29:03 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 01 Nov 2006 22:29:03 -0000
Subject: [Python-Dev] Path object design
Message-ID: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com>

On 08:14 pm, sluggoster at gmail.com wrote:
>Argh, it's difficult to respond to one topic that's now spiraling into
>two conversations on two lists.

>glyph at divmod.com wrote:

>(...) people have had to spend five years putting hard-to-read
>os.path functions in the code, or reinventing the wheel with their own
>libraries that they're not sure they can trust.  I started to use
>path.py last year when it looked like it was emerging as the basis of
>a new standard, but yanked it out again when it was clear the API
>would be different by the time it's accepted.  I've gone back to
>os.path for now until something stable emerges but I really wish I
>didn't have to.

You *don't* have to.  This is a weird attitude I've encountered over and over again in the Python community, although sometimes it masquerades as resistance to Twisted or Zope or whatever.  It's OK to use libraries.  It's OK even to use libraries that Guido doesn't like!  I'm pretty sure the first person to tell you that would be Guido himself.  (Well, second, since I just told you.)  If you like path.py and it solves your problems, use path.py.  You don't have to cram it into the standard library to do that.  It won't be any harder to migrate from an old path object to a new path object than from os.path to a new path object, and in fact it would likely be considerably easier.

>> *It is already used in a large body of real, working code, and
>> therefore its limitations are known.*
>
>This is an important consideration.However, to me a clean API is more
>important.

It's not that I don't think a "clean" API is important.  It's that I think that "clean" is a subjective assessment that is hard to back up, and it helps to have some data saying "we think this is clean because there are very few bugs in this 100,000 line program written using it".  Any code that is really easy to use right will tend to have *some* aesthetic appeal.

>I took a quick look at filepath.  It looks similar in concept to PEP
>355.  Four concerns:
>    - unfamiliar method names (createDirectory vs mkdir, child vs join)

Fair enough, but "child" really means child, not join.  It is explicitly for joining one additional segment, with no slashes in it.

>    - basename/dirname/parent are methods rather than properties:
>leads to () overproliferation in user code.

The () is there because every invocation returns a _new_ object.  I think that this is correct behavior but I also would prefer that it remain explicit.

>    - the "secure" features may not be necessary.  If they are, this
>should be a separate discussion, and perhaps implemented as a
>subclass.

The main "secure" feature is "child" and it is, in my opinion, the best part about the whole class.  Some of the other stuff (rummaging around for siblings with extensions, for example) is probably extraneous.  child, however, lets you take a string from arbitrary user input and map it into a path segment, both securely and quietly.  Here's a good example (and this actually happened, this is how I know about that crazy windows 'special files' thing I wrote in my other recent message): you have a decision-making program that makes two files to store information about a process: "pro" and "con".  It turns out that "con" is shorthand for "fall in a well and die" in win32-ese.  A "secure" path manipulation library would alert you to this problem with a traceback rather than having it inexplicably freeze.  Obscure, sure, but less obscure would be getting deterministic errors from a user entering slashes into a text field that shouldn't accept them.

>    - stylistic objection to verbose camelCase names like createDirectory

There is no accounting for taste, I suppose.  Obviously if it violates the stlib's naming conventions it would have to be adjusted.

>> Path representation is a bike shed.  Nobody would have proposed
>> writing an entirely new embedded database engine for Python: python
>> 2.5 simply included SQLite because its utility was already proven.
>
>There's a quantum level of difference between path/file manipulation
>-- which has long been considered a requirement for any full-featured
>programming language -- and a database engine which is much more
>complex.

"quantum" means "the smallest possible amount", although I don't think you're using like that, so I think I agree with you.  No, it's not as hard as writing a database engine.  Nevertheless it is a non-trivial problem, one worthy of having its own library and clearly capable of generating a fair amount of its own discussion.

>Fredrik has convinced me that it's more urgent to OOize the pathname
>conversions than the filesystem operations.

I agree in the relative values.  I am still unconvinced that either is "urgent" in the sense that it needs to be in the standard library.

>Where have all the proponents of non-OO or limited-OO strategies been?

This continuum doesn't make any sense to me.  Where would you place Twisted's solution on it?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061101/8fff62db/attachment.htm 

From oliphant.travis at ieee.org  Wed Nov  1 23:49:08 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 15:49:08 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <4548FE08.7070402@v.loewis.de>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>
	<4548FE08.7070402@v.loewis.de>
Message-ID: <eib89a$4p3$1@sea.gmane.org>

Martin v. L?wis wrote:
> Travis E. Oliphant schrieb:
> 
>>2) complex-valued types (you might argue that it's just a 2-array of 
>>floats, but you could say the same thing about int as an array of 
>>bytes).  The point is how do people interpret the data.  Complex-valued 
>>data-types are very common.  It is one reason Fortran is still used by 
>>scientists.
> 
> 
> Well, by the same reasoning, you could argue that pixel values (RGBA)
> are missing in the PEP. It's a convenience, sure, and it may also help
> interfacing with the platform's FORTRAN implementation - however, are
> you sure that NumPy's complex layout is consistent with the platform's
> C99 _Complex definition?
> 

I think so (it is on gcc).  And yes, where you draw the line between 
fundamental and "derived" data-type is somewhat arbitrary.  I'd rather 
include complex-numbers than not given their prevalence in the 
data-streams I'm trying to make compatible with each other.

> 
>>3) Unicode characters
>>
>>4) What about floating-point representations that are not IEEE 754 
>>4-byte or 8-byte.
> 
> 
> Both of these are available in a platform-dependent way: if the
> platform uses non-IEEE754 formats for C float and C double, ctypes
> will interface with that just fine. It is actually vice versa:
> IEEE-754 4-byte and 8-byte is not supported in ctypes.

That's what I meant.  The 'f' kind in the data-type description is also 
intended to mean "platform float" whatever that is.  But, a complete 
data-format representation would have a way to describe other 
bit-layouts for floating point representation.  Even if you can't 
actually calculate directly with them without conversion.

> Same for Unicode: the platform's wchar_t is supported (as you said),
> but not a platform-independent (say) 4-byte little-endian.

Right.

It's a matter of scope.  Frankly, I'd be happy enough to start with 
"typecodes" in the extended buffer protocol (that's where the array 
module is now) and then move up to something more complete later.

But, since we already have an array interface for record-arrays to share 
information and data with each other, and ctypes showing all of it's 
power, then why not be more complete?


-Travis


From alexander.belopolsky at gmail.com  Thu Nov  2 00:42:25 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Wed, 1 Nov 2006 23:42:25 +0000 (UTC)
Subject: [Python-Dev] PEP: Adding data-type objects to Python
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>
	<4548FE08.7070402@v.loewis.de> <eib89a$4p3$1@sea.gmane.org>
Message-ID: <loom.20061102T000820-506@post.gmane.org>

Travis Oliphant <oliphant.travis <at> ieee.org> writes:
> Frankly, I'd be happy enough to start with 
> "typecodes" in the extended buffer protocol (that's where the array 
> module is now) and then move up to something more complete later.
> 

Let's just start with that.  The way I see the problem is that buffer protocol
is fine as long as your data is an array of bytes, but if it is an array of
doubles, you are out of luck. So, while I can do

>>> b = buffer(array('d', [1,2,3]))

there is not much that I can do with b.  For example, if I want to pass it to
numpy, I will have to provide the type and shape information myself:

>>> numpy.ndarray(shape=(3,), dtype=float, buffer=b)
array([ 1.,  2.,  3.])

With the extended buffer protocol, I should be able to do

>>> numpy.array(b)

So let's start by solving this problem and limit it to data that can be found
in a standard library array.  This way we can postpone the discussion of shapes,
strides and nested structs.

I propose a simple bf_gettypeinfo(PyObject *obj, int* type, int* bitsize) method
that would return a type code and the size of the data item.

I believe it is better to have type codes free from size information for
several reasons:

1. Generic code can use size information directly without having to know
that int is 32 and double is 64 bits.

2. Odd sizes can be easily described without having to add a new type code.

3. I assume that the existing bf_ functions would still return size in bytes,
so having item size available as an int will help to get number of items.

If we manage to agree on the standard way to pass primitive type information,
it will be a big achievement and immediately useful because simple arrays are
already in the standard library.

 
From p.f.moore at gmail.com  Thu Nov  2 01:01:40 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 2 Nov 2006 00:01:40 +0000
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <loom.20061102T000820-506@post.gmane.org>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
	<eiasv1$s32$1@sea.gmane.org> <4548FE08.7070402@v.loewis.de>
	<eib89a$4p3$1@sea.gmane.org> <loom.20061102T000820-506@post.gmane.org>
Message-ID: <79990c6b0611011601h37b1c805rda5bbee22127ce18@mail.gmail.com>

On 11/1/06, Alexander Belopolsky <alexander.belopolsky at gmail.com> wrote:
> Let's just start with that.  The way I see the problem is that buffer protocol
> is fine as long as your data is an array of bytes, but if it is an array of
> doubles, you are out of luck. So, while I can do
>
> >>> b = buffer(array('d', [1,2,3]))
>
> there is not much that I can do with b.  For example, if I want to pass it to
> numpy, I will have to provide the type and shape information myself:
>
> >>> numpy.ndarray(shape=(3,), dtype=float, buffer=b)
> array([ 1.,  2.,  3.])
>
> With the extended buffer protocol, I should be able to do
>
> >>> numpy.array(b)

As a data point, this is the first posting that has clearly explained
to me what the two PEPs are attempting to achieve. That may be my
blindness to what others find self-evident, but equally, I may not be
the only one who needed this example...

Paul.

From oliphant.travis at ieee.org  Thu Nov  2 01:46:48 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 17:46:48 -0700
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
	information.
In-Reply-To: <eib4jk$og2$1@sea.gmane.org>
References: <ei6pde$7mn$1@sea.gmane.org>
	<4547BF86.6070806@v.loewis.de>	<loom.20061101T195940-807@post.gmane.org>
	<eib4jk$og2$1@sea.gmane.org>
Message-ID: <eibf60$o8i$1@sea.gmane.org>

Fredrik Lundh wrote:
> Chris Barker wrote:
> 
> 
>>While /F suggested we get off the PIL bandwagon
> 
> 
> I suggest we drop the obsession with pointers to memory areas that are 
> supposed to have a specific format; modern data access API:s don't work 
> that way for good reasons, so I don't see why Python should grow a 
> standard based on that kind of model.
> 
> the "right solution" for things like this is an *API* that lets you do 
> things like:
> 
>      view = object.acquire_view(region, supported formats)
>      ... access data in view ...
>      view.release()
> 
> and, for advanced users
> 
>      format = object.query_format(constraints)

So, if the extended buffer protocol were enhanced to enforce this kind 
of viewing and release, then would you support it?

Basically, the extended buffer protocol would at the same time as 
providing *more* information about the "view" require the implementer to 
undertand the idea of "holding" and "releasing" the view.

Would this basically require the object supporting the extended buffer 
protocol to keep some kind of list of who has views (or at least a 
number indicating how many views there are)?


-Travis


From oliphant.travis at ieee.org  Thu Nov  2 01:58:01 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 17:58:01 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <loom.20061102T000820-506@post.gmane.org>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>	<4548FE08.7070402@v.loewis.de>
	<eib89a$4p3$1@sea.gmane.org>
	<loom.20061102T000820-506@post.gmane.org>
Message-ID: <eibfqq$ppd$1@sea.gmane.org>

Alexander Belopolsky wrote:
> Travis Oliphant <oliphant.travis <at> ieee.org> writes:
> 
> 
>>>>b = buffer(array('d', [1,2,3]))
> 
> 
> there is not much that I can do with b.  For example, if I want to pass it to
> numpy, I will have to provide the type and shape information myself:
> 
> 
>>>>numpy.ndarray(shape=(3,), dtype=float, buffer=b)
> 
> array([ 1.,  2.,  3.])
> 
> With the extended buffer protocol, I should be able to do
> 
> 
>>>>numpy.array(b)

or just

numpy.array(array.array('d',[1,2,3]))

and leave-out the buffer object all together.


> 
> 
> So let's start by solving this problem and limit it to data that can be found
> in a standard library array.  This way we can postpone the discussion of shapes,
> strides and nested structs.

Don't lump those ideas together.  Shapes and strides are necessary for 
N-dimensional array's (it's essentially what *defines* the N-dimensional 
array).   I really don't want to sacrifice those in the extended buffer 
protocol.  If you want to separate them into different functions then 
that is a possibility.

> 
> If we manage to agree on the standard way to pass primitive type information,
> it will be a big achievement and immediately useful because simple arrays are
> already in the standard library.
> 

We could start there, I suppose.  Especially if it helps us all get on 
the same page.  But, we already see the applications beyond this simple 
case so I would like to have at least an "eye" for the more difficult 
case which we already have a working solution for in the "array interface"

-Travis


From oliphant.travis at ieee.org  Thu Nov  2 02:08:41 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 18:08:41 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
References: <ehton4$t7n$1@sea.gmane.org>	<20061028135415.GA13049@code0.codespeak.net>	<ei6ak5$52s$1@sea.gmane.org>
	<79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
Message-ID: <45494519.4020501@ieee.org>

Paul Moore wrote:
> 
> 
> Enough of the abstract. As a concrete example, suppose I have a (byte)
> string in my program containing some binary data - an ID3 header, or a
> TCP packet, or whatever. It doesn't really matter. Does your proposal
> offer anything to me in how I might manipulate that data (assuming I'm
> not using NumPy)? (I'm not insisting that it should, I'm just trying
> to understand the scope of the PEP).
> 

What do you mean by "manipulate the data."  The proposal for a 
data-format object would help you describe that data in a standard way 
and therefore share that data between several library that would be able 
to understand the data (because they all use and/or understand the 
default Python way to handle data-formats).

It would be up to the other packages to "manipulate" the data.

So, what you would be able to do is take your byte-string and create a 
buffer object which you could then share with other packages:

Example:

b = buffer(bytestr, format=data_format_object)

Now.

a = numpy.frombuffer(b)
a['field1']  # prints data stored in the field named "field1"

etc.

Or.

cobj = ctypes.frombuffer(b)

# Now, cobj is a ctypes object that is basically a "structure" that can 
be passed # directly to your C-code.

Does this help?

-Travis


From sluggoster at gmail.com  Thu Nov  2 02:46:49 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Wed, 1 Nov 2006 17:46:49 -0800
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
Message-ID: <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com>

On 11/1/06, glyph at divmod.com <glyph at divmod.com> wrote:
>
> On 06:14 pm, fredrik at pythonware.com wrote:
> >glyph at divmod.com wrote:
> >
> >> I assert that it needs a better[1] interface because the current
> >> interface can lead to a variety of bugs through idiomatic, apparently
> >> correct usage.  All the more because many of those bugs are related to
> >> critical errors such as security and data integrity.
>
> >instead of referring to some esoteric knowledge about file systems that
> >us non-twisted-using mere mortals may not be evolved enough to under-
> >stand,
>
> On the contrary, twisted users understand even less, because (A) we've been
> demonstrated to get it wrong on numerous occasions in highly public and
> embarrassing ways and (B) we already have this class that does it all for us
> and we can't remember how it works :-).

This is ironic coming from one of Python's celebrity geniuses.  "We
made this class but we don't know how it works."  Actually, it's
downright alarming coming from someone who knows Twisted inside and
out yet still can't make sense of path patform oddities.

>  * This is confusing as heck:
>    >>> os.path.join("hello", "/world")
>    '/world'

That's in the documentation.  I'm not sure it's "wrong".  What should
it do in this situation?  Pretend the slash isn't there?

This came up in the directory-tuple proposal.  I said there was no
reason to change the existing behavior of join.  Noam favored an
exception.

>    >>> os.path.join("hello", "slash/world")
>    'hello/slash/world'

That has always been a loophole in the function, and many programs
depend on it.  Again, is it "wrong"?  Should an embedded separator in
an argument be an error?  Obviously this depends on the user's
knowledge that the separator happens to be slash.

>    >>> os.path.join("hello", "slash//world")
>    'hello/slash//world'

Again a case of what "should" it do?  The filesystem treats it as a
single slash.  The user didn't call normpath, so should we normalize
it anyway?

>  * Sometimes a path isn't a path; the zip "paths" in sys.path are a good
> example.  This is why I'm a big fan of including a polymorphic interface of
> some kind: this information is *already* being persisted in an ad-hoc and
> broken way now, so it needs to be represented; it would be good if it were
> actually represented properly.  URL
> manipulation-as-path-manipulation is another; the recent
> perforce use-case mentioned here is a special case of that, I think.

Good point, but exactly what functionality do you want to see for zip
files and URLs?  Just pathname manipulation?  Or the ability to see
whether a file exists and extract it, copy it, etc?

>  * you have to care about unicode sometimes.  rarely enough that none of
> your tests will ever account for it, but often enough that _some_ users will
> notice breakage if your code is ever widely distributed.

This is a Python-wide problem.  The move to universal unicode will
lessen this, or at least move the problem to *one* place (creating the
unicode object), where every Python programmer will get bitten by it
and we'll develop a few standard strategies to deal with it.

(The problem is that if str and unicode are mixed in expressions,
Python will promote the str to unicode and you'll get a
UnicodeDecodeError if it contains non-ASCII characters.  Figuring out
all the ways such strings can slip into a program is difficult if
you're dealing with user strings from an unknown charset, or your
MySQL server is configured differently than you thought it was, or the
string contains Windows curly quotes et al which are undefined in
Latin-1.)

>  * the documentation really can't emphasize enough how bad using
> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist
> when it is a contended resource, is.  It can be handy, but it is _always_ a
> race condition.

What else can you do?  It's either os.path.exists()/os.remove() or "do
it anyway and catch the exception".  And sometimes you have to check
the filetype in order to determine *what* to do.

-- 
Mike Orr <sluggoster at gmail.com>

From oliphant.travis at ieee.org  Thu Nov  2 02:08:41 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Wed, 01 Nov 2006 18:08:41 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
References: <ehton4$t7n$1@sea.gmane.org>	<20061028135415.GA13049@code0.codespeak.net>	<ei6ak5$52s$1@sea.gmane.org>
	<79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
Message-ID: <45494519.4020501@ieee.org>

Paul Moore wrote:
> 
> 
> Enough of the abstract. As a concrete example, suppose I have a (byte)
> string in my program containing some binary data - an ID3 header, or a
> TCP packet, or whatever. It doesn't really matter. Does your proposal
> offer anything to me in how I might manipulate that data (assuming I'm
> not using NumPy)? (I'm not insisting that it should, I'm just trying
> to understand the scope of the PEP).
> 

What do you mean by "manipulate the data."  The proposal for a 
data-format object would help you describe that data in a standard way 
and therefore share that data between several library that would be able 
to understand the data (because they all use and/or understand the 
default Python way to handle data-formats).

It would be up to the other packages to "manipulate" the data.

So, what you would be able to do is take your byte-string and create a 
buffer object which you could then share with other packages:

Example:

b = buffer(bytestr, format=data_format_object)

Now.

a = numpy.frombuffer(b)
a['field1']  # prints data stored in the field named "field1"

etc.

Or.

cobj = ctypes.frombuffer(b)

# Now, cobj is a ctypes object that is basically a "structure" that can 
be passed # directly to your C-code.

Does this help?

-Travis

From sluggoster at gmail.com  Thu Nov  2 03:36:31 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Wed, 1 Nov 2006 18:36:31 -0800
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com>
References: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com>
Message-ID: <6e9196d20611011836k5e62990pf8851d066ea120b2@mail.gmail.com>

On 11/1/06, glyph at divmod.com <glyph at divmod.com> wrote:
> On 08:14 pm, sluggoster at gmail.com wrote:

> >(...) people have had to spend five years putting hard-to-read
> >os.path functions in the code, or reinventing the wheel with their own
> >libraries that they're not sure they can trust.  I started to use
> >path.py last year when it looked like it was emerging as the basis of
> >a new standard, but yanked it out again when it was clear the API
> >would be different by the time it's accepted.  I've gone back to
> >os.path for now until something stable emerges but I really wish I
> >didn't have to.
>
> You *don't* have to.  This is a weird attitude I've encountered over and
> over again in the Python community, although sometimes it masquerades as
> resistance to Twisted or Zope or whatever.  It's OK to use libraries.  It's
> OK even to use libraries that Guido doesn't like!  I'm pretty sure the first
> person to tell you that would be Guido himself.  (Well, second, since I just
> told you.)  If you like path.py and it solves your problems, use path.py.
> You don't have to cram it into the standard library to do that.  It won't be
> any harder to migrate from an old path object to a new path object than from
> os.path to a new path object, and in fact it would likely be considerably
> easier.

Oh, I understand it's OK to use libraries.  It's just that a path
library needs to be widely tested and well supported so you know it
won't scramble your files.  A bug in a date library affects only
datetimes. A bug in a database database library affects only that
database.  A bug in a template library affects only the page being
output.  But a bug in a path library could ruin your whole day.  "Um,
remember those important files in that other project directory you
weren't working in? They were just overwritten."

Also, I train several programmers new to Python at work. I want to
make them learn *one* path library that we'll be sure to stick with
for several years.  Every path library has subtle quirks, and
switching from one to another may not be just a matter of renaming
methods.

> >    - the "secure" features may not be necessary.  If they are, this
> >should be a separate discussion, and perhaps implemented as a
> >subclass.
>
> The main "secure" feature is "child" and it is, in my opinion, the best part
> about the whole class.  Some of the other stuff (rummaging around for
> siblings with extensions, for example) is probably extraneous.  child,
> however, lets you take a string from arbitrary user input and map it into a
> path segment, both securely and quietly.  Here's a good example (and this
> actually happened, this is how I know about that crazy windows 'special
> files' thing I wrote in my other recent message): you have a decision-making
> program that makes two files to store information about a process: "pro" and
> "con".  It turns out that "con" is shorthand for "fall in a well and die" in
> win32-ese.  A "secure" path manipulation library would alert you to this
> problem with a traceback rather than having it inexplicably freeze.
> Obscure, sure, but less obscure would be getting deterministic errors from a
> user entering slashes into a text field that shouldn't accept them.

Perhaps you're right.  I'm not saying it *should not* be a basic
feature, just that unless the Python community as a whole is ready for
this, users should have a choice to use it or not.

I learned about DOS device files from the manuals back in the 80s.
But I had completely forgotten them when I made several "aux"
directories in a Subversion repository on Linux.  People tried to
check it out on Windows and... got some kind of error.  "CON" means
console: its input comes from the keyboard and its output goes to the
screen.  Since this is a device file, I'm not sure a path library has
any responsibility to treat it specially.  We don't treat
"/dev/stdout" specially unless the user specifically calls a device
function. I have no idea why Microsoft thought it was a good idea to
put the seven-odd device files in every directory. Why not force
people to type the colon ("CON:").  If they've memorized what CON
means they should have no trouble with the colon, especially since
it's required with "A:" and "C:" anyway

For trivia, these are the ones I remember:
    CON               Console  (keyboard input, screen output)
    KBRD              Keyboard input.
    ???                  screen output
    LPT1/2/3        parallel ports
    COM 1/2/3/4  serial ports
    PRN                  alias for default printer port (normally LPT1)
    NUL                  bit bucket
    AUX                  game port?

COPY CON FILENAME.TXT     # Unix: "cat >filename.txt".
COPY FILENAME.TXT PRN      # Unix: "lp filename.txt"  or "cat
filename.txt | lp".
TYPE FILENAME.TXT               # Unix: "cat filename.txt".

> >Where have all the proponents of non-OO or limited-OO strategies been?
>
> This continuum doesn't make any sense to me.  Where would you place
> Twisted's solution on it?

In the "let's create a brilliant library and put a dark box around it
so nobody knows it's there" position.  Although you say you've been
trying to spread the word about it. For whatever reason, I haven't
heard about it till now.  Not sure what this means.

But what I meant is, we OO proponents have been trying to promote
path.py and/or get a similar module into the stdlib for years, and all
we got was... not even hostility... just indifference and silence.
People like to complain about os.path but not do anything about fixing
it, or even to say which approach they *would* support.  Talin started
a great thread on the python-3000 list, going back to the beginning
and saying "What is wrong with os.path, how much does it need fixing,
and is consensus on an API possible?"  Maybe he did what the rest of
us (including me) should have done long ago.

-- 
Mike Orr <sluggoster at gmail.com>

From glyph at divmod.com  Thu Nov  2 04:18:27 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 02 Nov 2006 03:18:27 -0000
Subject: [Python-Dev] Path object design
Message-ID: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com>

On 01:46 am, sluggoster at gmail.com wrote:
>On 11/1/06, glyph at divmod.com <glyph at divmod.com> wrote:

>This is ironic coming from one of Python's celebrity geniuses.  "We
>made this class but we don't know how it works."  Actually, it's
>downright alarming coming from someone who knows Twisted inside and
>out yet still can't make sense of path patform oddities.

Man, it is going to be hard being ironically self-deprecating if people keep going around calling me a "celebrity genius".  My ego doesn't need any help, you know? :)

In some sense I was being serious; part of the point of abstraction is embedding some of your knowledge in your code so you don't have to keep it around in your brain all the time.  I'm sure that my analysis of path-based problems wasn't exhaustive because I don't really use os.path for path manipulation.  I use static.File and it _works_, I only remember these os.path flaws from the process of writing it, not daily use.

>>  * This is confusing as heck:
>>    >>> os.path.join("hello", "/world")
>>    '/world'
>
>That's in the documentation.  I'm not sure it's "wrong".  What should
>it do in this situation?  Pretend the slash isn't there?

You can document anything.  That doesn't really make it a good idea.

The point I was trying to make wasn't really that os.path is *wrong*.  Far from it, in fact, it defines some useful operations and they are basically always correct.  I didn't even say "wrong", I said "confusing".  FilePath is implemented strictly in terms of os.path because it _does_ do the right thing with its inputs.  The question is, how hard is it to remember what its inputs should be?

>>    >>> os.path.join("hello", "slash/world")
>>    'hello/slash/world'
>
>That has always been a loophole in the function, and many programs
>depend on it.

If you ever think I'm suggesting breaking something in Python, you're misinterpreting me ;).  I am as cagey as they come about this.  No matter what else happens, the behavior of os.path should not really change.

>The user didn't call normpath, so should we normalize it anyway?

That's really the main point here.

What is a path that hasn't been "normalized"?  Is it a path at all, or is it some random garbage with slashes (or maybe other things) in it?  os.path performs correct path algebra on correct inputs, and it's correct (as far as one can be correct) on inputs that have weird junk in them.

In the strings-and-functions model of paths, this all makes perfect sense, and there's no particular sensibility associated with defining ideas like "equivalency" for paths, unless that's yet another function you pass some strings to.  I definitely prefer this:
    path1 == path2
to this:
    os.path.abspath(pathstr1) == os.path.abspath(pathstr2)
though.

You'll notice I used abspath instead of normpath.  As a side note, I've found interpreting relative paths as always relative to the current directory is a bad idea.  You can see this when you have a daemon that daemonizes and then opens files: the user thinks they're specifying relative paths from wherever they were when they ran the program, the program thinks they're relative paths from /var/run/whatever.  Relative paths, if they should exist at all, should have to be explicitly linked as relative to something *else* (e.g. made absolute) before they can be used.  I think that sequences of strings might be sufficient though.

>Good point, but exactly what functionality do you want to see for zip
>files and URLs?  Just pathname manipulation?  Or the ability to see
>whether a file exists and extract it, copy it, etc?

The latter.  See http://twistedmatrix.com/trac/browser/trunk/twisted/python/zippath.py

This is still _really_ raw functionality though.  I can't claim that it has the same "it's been used in real code" endorsement as the rest of the FilePath stuff I've been talking about.  I've never even tried to hook this up to a Twisted webserver, and I've only used it in one environment.

>>  * you have to care about unicode sometimes.

>This is a Python-wide problem.

I completely agree, and this isn't the thread to try to solve it.  The absence of a path object, however, and the path module's reliance on strings, exacerbates the problem.  The fact that FilePath doesn't deal with this either, however, is a fairly good indication that the problem is deeper than that.

>>  * the documentation really can't emphasize enough how bad using
>> 'os.path.exists/isfile/isdir', and then assuming the file continues to exist
>> when it is a contended resource, is.  It can be handy, but it is _always_ a
>> race condition.
>
>What else can you do?  It's either os.path.exists()/os.remove() or "do
>it anyway and catch the exception".  And sometimes you have to check
>the filetype in order to determine *what* to do.

You have to catch the exception anyway in many cases.  I probably shouldn't have mentioned it though, it's starting to get a bit far afield of even this ridiculously far-ranging discussion.  A more accurate criticism might be that "the absence of a file locking system in the stdlib means that there are lots outside it, and many are broken".  Different issue though; if it's related, it's a different method that can be added later.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061102/6d496b43/attachment.html 

From sluggoster at gmail.com  Thu Nov  2 04:42:44 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Wed, 1 Nov 2006 19:42:44 -0800
Subject: [Python-Dev] Path object design
In-Reply-To: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com>
References: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com>
Message-ID: <6e9196d20611011942x7d09789ah428f833f623029@mail.gmail.com>

On 11/1/06, glyph at divmod.com <glyph at divmod.com> wrote:
> On 01:46 am, sluggoster at gmail.com wrote:
> >On 11/1/06, glyph at divmod.com <glyph at divmod.com> wrote:
>
> >This is ironic coming from one of Python's celebrity geniuses.  "We
> >made this class but we don't know how it works."  Actually, it's
> >downright alarming coming from someone who knows Twisted inside and
> >out yet still can't make sense of path patform oddities.
>
> Man, it is going to be hard being ironically self-deprecating if people keep
> going around calling me a "celebrity genius".  My ego doesn't need any help,
> you know? :)

I respect Twisted in the same way I respect a loaded gun.  It's
powerful, but approach with caution.

> If you ever think I'm suggesting breaking something in Python, you're
> misinterpreting me ;).  I am as cagey as they come about this.  No matter
> what else happens, the behavior of os.path should not really change.

The point is, what *should* a join-like method do in a future improved
path module?  os.path.join should not change because too many programs
depend on its current behavior, in ways we can't necessarily predict.
But a new function/method is not bound by these constraints, as long
as the boundary cases are well documented.  All the os.path and
file-related os/shutil functions need to be reexamined in this
context.  Maybe the existing behavior is best, maybe we'll keep it
even if it's sub-optimal, but we should document why we're making
these choices.

> >The user didn't call normpath, so should we normalize it anyway?
>
> That's really the main point here.
>
> What is a path that hasn't been "normalized"?  Is it a path at all, or is it
> some random garbage with slashes (or maybe other things) in it?  os.path
> performs correct path algebra on correct inputs, and it's correct (as far as
> one can be correct) on inputs that have weird junk in them.

I'm tempted to say Path("/a/b").join("c", "d") should do the same
thing your .child method does, but allow multiple levels in one step.

But on the other hand, there will always be people with prebuilt
"path/fragments" to join to other fragments, and I'm not sure we
should force them to split the fragment just to rejoin it again.
Maybe we need a .join_unsafe method for this, haha.

-- 
Mike Orr <sluggoster at gmail.com>

From alexander.belopolsky at gmail.com  Thu Nov  2 05:42:14 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Thu, 2 Nov 2006 04:42:14 +0000 (UTC)
Subject: [Python-Dev] PEP: Adding data-type objects to Python
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>	<4548FE08.7070402@v.loewis.de>
	<eib89a$4p3$1@sea.gmane.org>
	<loom.20061102T000820-506@post.gmane.org>
	<eibfqq$ppd$1@sea.gmane.org>
Message-ID: <loom.20061102T045218-630@post.gmane.org>

Travis Oliphant <oliphant.travis <at> ieee.org> writes:
>
> Don't lump those ideas together.  Shapes and strides are necessary for 
> N-dimensional array's (it's essentially what *defines* the N-dimensional 
> array).   I really don't want to sacrifice those in the extended buffer 
> protocol.  If you want to separate them into different functions then 
> that is a possibility.
>

I don't understand.  Do you want to discuss shapes and strides separately
from the datatype or not? Note that in ctypes shape is a property of 
datatype (as in c_int*2*3).   In your proposal, shapes and strides are
communicated separately.  This presents a unique memory management
challenge: if the object does not contain shape information in a ready to
be pointed to form, who is responsible for deallocating the shape array?  
 
> > 
> > If we manage to agree on the standard way to pass primitive type information,
> > it will be a big achievement and immediately useful because simple arrays are
> > already in the standard library.
> > 
> 
> We could start there, I suppose.  Especially if it helps us all get on 
> the same page.

Let's start:

1. Should primitive types be associated with simple type codes (short, int, long,
float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), 
(float, 64)]?
     - I prefer pairs

2. Should primitive type codes be characters or integers (from an enum) at
C level?
    - I prefer integers

3. Should size be expressed in bits or bytes?
    - I prefer bits


From oliphant.travis at ieee.org  Thu Nov  2 06:01:50 2006
From: oliphant.travis at ieee.org (Travis E. Oliphant)
Date: Wed, 01 Nov 2006 22:01:50 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <loom.20061102T045218-630@post.gmane.org>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>	<4548FE08.7070402@v.loewis.de>	<eib89a$4p3$1@sea.gmane.org>	<loom.20061102T000820-506@post.gmane.org>	<eibfqq$ppd$1@sea.gmane.org>
	<loom.20061102T045218-630@post.gmane.org>
Message-ID: <eibu04$qbm$1@sea.gmane.org>

Alexander Belopolsky wrote:
> Travis Oliphant <oliphant.travis <at> ieee.org> writes:
>> Don't lump those ideas together.  Shapes and strides are necessary for 
>> N-dimensional array's (it's essentially what *defines* the N-dimensional 
>> array).   I really don't want to sacrifice those in the extended buffer 
>> protocol.  If you want to separate them into different functions then 
>> that is a possibility.
>>
> 
> I don't understand.  Do you want to discuss shapes and strides separately
> from the datatype or not? Note that in ctypes shape is a property of 
> datatype (as in c_int*2*3).   In your proposal, shapes and strides are
> communicated separately.  This presents a unique memory management
> challenge: if the object does not contain shape information in a ready to
> be pointed to form, who is responsible for deallocating the shape array?  
>  

Perhaps a "view object" should be returned like /F suggests and it 
manages the shape, strides, and data-format.


>>> If we manage to agree on the standard way to pass primitive type information,
>>> it will be a big achievement and immediately useful because simple arrays are
>>> already in the standard library.
>>>
>> We could start there, I suppose.  Especially if it helps us all get on 
>> the same page.
> 
> Let's start:
> 
> 1. Should primitive types be associated with simple type codes (short, int, long,
> float, double) or type/size pairs [(int,16), (int, 32), (int, 64), (float, 32), 
> (float, 64)]?
>      - I prefer pairs
> 

> 2. Should primitive type codes be characters or integers (from an enum) at
> C level?
>     - I prefer integers

Are these orthogonal?

> 
> 3. Should size be expressed in bits or bytes?
>     - I prefer bits
> 

So, you want an integer enum for the "kind" and an integer for the 
bitsize?   That's fine with me.

One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
in structmember.h already.  Should we just re-use those #defines while 
adding to them to make an easy to use interface for primitive types?

-Travis


From alexander.belopolsky at gmail.com  Thu Nov  2 06:42:26 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Thu, 2 Nov 2006 05:42:26 +0000 (UTC)
Subject: [Python-Dev] PEP: Adding data-type objects to Python
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>	<4548FE08.7070402@v.loewis.de>	<eib89a$4p3$1@sea.gmane.org>	<loom.20061102T000820-506@post.gmane.org>	<eibfqq$ppd$1@sea.gmane.org>
	<loom.20061102T045218-630@post.gmane.org>
	<eibu04$qbm$1@sea.gmane.org>
Message-ID: <loom.20061102T062310-608@post.gmane.org>

Travis E. Oliphant <oliphant.travis <at> ieee.org> writes:

> 
> Alexander Belopolsky wrote:
> > ...
> > 1. Should primitive types be associated with simple type codes
 (short, 
int, long,
> > float, double) or type/size pairs [(int,16), (int, 32), (int, 64), 
(float, 32), 
> > (float, 64)]?
> >      - I prefer pairs
> > 
> > 2. Should primitive type codes be characters or integers (from an 
enum) at
> > C level?
> >     - I prefer integers
> 
> Are these orthogonal?
> 

Do you mean are my quiestions 1 and 2 orthogonal? I guess they are.

> > 
> > 3. Should size be expressed in bits or bytes?
> >     - I prefer bits
> > 
> 
> So, you want an integer enum for the "kind" and an integer for the 
> bitsize?   That's fine with me.
> 
> One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
> in structmember.h already.  Should we just re-use those #defines while 
> adding to them to make an easy to use interface for primitive types?
> 

I was thinking about using something like NPY_TYPES enum, but T_* 
codes would work as well.  Let me just present both options for the
 record:

 --- numpy/ndarrayobject.h ---

enum NPY_TYPES {    NPY_BOOL=0,
                    NPY_BYTE, NPY_UBYTE,
                    NPY_SHORT, NPY_USHORT,
                    NPY_INT, NPY_UINT,
                    NPY_LONG, NPY_ULONG,
                    NPY_LONGLONG, NPY_ULONGLONG,
                    NPY_FLOAT, NPY_DOUBLE, NPY_LONGDOUBLE,
                    NPY_CFLOAT, NPY_CDOUBLE, NPY_CLONGDOUBLE,
                    NPY_OBJECT=17,
                    NPY_STRING, NPY_UNICODE,
                    NPY_VOID,
                    NPY_NTYPES,
                    NPY_NOTYPE,
                    NPY_CHAR,      /* special flag */
                    NPY_USERDEF=256  /* leave room for characters */
};

--- structmember.h ---

/* Types */
#define T_SHORT         0
#define T_INT           1
#define T_LONG          2
#define T_FLOAT         3
#define T_DOUBLE        4
#define T_STRING        5
#define T_OBJECT        6
/* XXX the ordering here is weird for binary compatibility */
#define T_CHAR          7       /* 1-character string */
#define T_BYTE          8       /* 8-bit signed int */
/* unsigned variants: */
#define T_UBYTE         9
#define T_USHORT        10
#define T_UINT          11
#define T_ULONG         12

/* Added by Jack: strings contained in the structure */
#define T_STRING_INPLACE        13

#define T_OBJECT_EX     16      /* Like T_OBJECT, but raises AttributeError
                                   when the value is NULL, instead of
                                   converting to None. */
#ifdef HAVE_LONG_LONG
#define T_LONGLONG      17  
#define T_ULONGLONG      18
#endif /* HAVE_LONG_LONG */


From martin at v.loewis.de  Thu Nov  2 07:09:12 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 02 Nov 2006 07:09:12 +0100
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <eibu04$qbm$1@sea.gmane.org>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>	<4548FE08.7070402@v.loewis.de>	<eib89a$4p3$1@sea.gmane.org>	<loom.20061102T000820-506@post.gmane.org>	<eibfqq$ppd$1@sea.gmane.org>	<loom.20061102T045218-630@post.gmane.org>
	<eibu04$qbm$1@sea.gmane.org>
Message-ID: <45498B88.1000306@v.loewis.de>

Travis E. Oliphant schrieb:
>> 2. Should primitive type codes be characters or integers (from an enum) at
>> C level?
>>     - I prefer integers
> 
>> 3. Should size be expressed in bits or bytes?
>>     - I prefer bits
>>
> 
> So, you want an integer enum for the "kind" and an integer for the 
> bitsize?   That's fine with me.
> 
> One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
> in structmember.h already.  Should we just re-use those #defines while 
> adding to them to make an easy to use interface for primitive types?

Notice that those type codes imply sizes, namely the platform sizes
(where "platform" always means "what the C compiler does"). So if
you want to have platform-independent codes as well, you shouldn't
use the T_ codes.

Regards,
Martin

From sluggoster at gmail.com  Thu Nov  2 07:47:54 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Wed, 1 Nov 2006 22:47:54 -0800
Subject: [Python-Dev] Mini Path object
Message-ID: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com>

Posted to python-dev and python-3000.  Follow-ups to python-dev only please.

On 10/31/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
>   here's mine; it's fully backwards compatible, can go right into 2.6,
> and can be incrementally improved in future releases:
>
>     1) add a pathname wrapper to "os.path", which lets you do basic
>        path "algebra".  this should probably be a subclass of unicode,
>        and should *only* contain operations on names.
>
>     2) make selected "shutil" operations available via the "os" name-
>        space; the old POSIX API vs. POSIX SHELL distinction is pretty
>        irrelevant.  also make the os.path predicates available via the
>        "os" namespace.
>
> this gives a very simple conceptual model for the user; to manipulate
> path *names*, use "os.path.<op>(string)" functions or the "<path>"
> wrapper.  to manipulate *objects* identified by a path, given either as
> a string or a path wrapper, use "os.<op>(path)".  this can be taught in
> less than a minute.

Given the widely-diverging views on what, if anything, should be done
to os.path, how about we make a PEP and a standalone implementation of
(1) for now, and leave (2) and everything else for a later PEP.  This
will make people who want a reasonably forward-compatable object NOW
for their Python 2.4/2.5 programs happy, provide a common seed for
more elaborate libraries that may be proposed for the standard library
later (and eliminate the possibility of moving the other functions and
later deprecating them), and provide a module that will be well tested
by the time 2.6 is ready for finalization.

There's already a reference implementation in PEP 355, we'd just have
to strip out the non-pathname features.  There's a copy here
(http://wiki.python.org/moin/PathModule) that looks reasonably recent
(constructors are self.__class__() to make it subclassable), although
I wonder why the class is called path instead of Path.  There was
another copy in the Python CVS  although I can't find it now; was it
deleted in the move to Subversion?  (I thought it was in
/sandbox/trunk/: http://svn.python.org/view/sandbox/trunk/).

So, let's say we strip this Path class to:

class Path(unicode):
    Path("foo")
    Path(  Path("directory"),   "subdirectory", "file")    # Replaces
.joinpath().
    Path()
    Path.cwd()
    Path("ab") + "c"  => Path("abc")
    .abspath()
    .normcase()
    .normpath()
    .realpath()
    .expanduser()
    .expandvars()
    .expand()
    .parent
    .name                 # Full filename without path
    .namebase        # Filename without extension
    .ext
    .drive
    .splitpath()
    .stripext()
    .splitunc()
    .uncshare
    .splitall()
    .relpath()
    .relpathto()

Would this offend anyone?  Are there any attribute renames or method
enhancements people just can't live without?  'namebase' is the only
name I hate but I could live with it.

The multi-argument constructor is a replacement for joining paths.
(The PEP says .joinpath was "problematic" without saying why.)    This
could theoretically go either way, doing either the same thing as
os.path.join, getting a little smarter, or doing "safe" joins by
disallowing "/" embedded in string arguments.

I would say that a directory-tuple Path object with these features
could be maintained in parallel, but since the remaining functions
require string arguments you'd have to use unicode() a lot.

-- 
Mike Orr <sluggoster at gmail.com>

From p.f.moore at gmail.com  Thu Nov  2 09:46:09 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 2 Nov 2006 08:46:09 +0000
Subject: [Python-Dev] [Python-3000] Mini Path object
In-Reply-To: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com>
References: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com>
Message-ID: <79990c6b0611020046j9d95781i378b65a55ea016c3@mail.gmail.com>

On 11/2/06, Mike Orr <sluggoster at gmail.com> wrote:
> Given the widely-diverging views on what, if anything, should be done
> to os.path, how about we make a PEP and a standalone implementation of
> (1) for now, and leave (2) and everything else for a later PEP.

Why write a PEP at this stage? Just release your proposal as a module,
and see if people use it. If they do, write a PEP to include it in the
stdlib. (That's basically what happened with the original PEP - it
started off proposing Jason Orendorff's path module IIRC).

>From what you're proposing, I may well use such a module, if it helps
:-) (But I'm not sure I'd vote for it in to go the stdlib without
having survived as an external module first)

Paul.

From p.f.moore at gmail.com  Thu Nov  2 09:53:56 2006
From: p.f.moore at gmail.com (Paul Moore)
Date: Thu, 2 Nov 2006 08:53:56 +0000
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <45494519.4020501@ieee.org>
References: <ehton4$t7n$1@sea.gmane.org>
	<20061028135415.GA13049@code0.codespeak.net>
	<ei6ak5$52s$1@sea.gmane.org>
	<79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
	<45494519.4020501@ieee.org>
Message-ID: <79990c6b0611020053w6a11c424yfc6d329ab48e4a90@mail.gmail.com>

On 11/2/06, Travis Oliphant <oliphant.travis at ieee.org> wrote:
> What do you mean by "manipulate the data."  The proposal for a
> data-format object would help you describe that data in a standard way
> and therefore share that data between several library that would be able
> to understand the data (because they all use and/or understand the
> default Python way to handle data-formats).
>
> It would be up to the other packages to "manipulate" the data.

Yes, some other messages I read since I posted this clarified it for
me. Essentially, as a Python programmer, there's nothing in the PEP
for me - it's for extension writers (and maybe writers of some
lower-level Python modules? I'm not sure about this). So as I'm not
really the target audience, I won't comment further.

> So, what you would be able to do is take your byte-string and create a
> buffer object which you could then share with other packages:
>
> Example:
>
> b = buffer(bytestr, format=data_format_object)
>
> Now.
>
> a = numpy.frombuffer(b)
> a['field1']  # prints data stored in the field named "field1"
>
> etc.
>
> Or.
>
> cobj = ctypes.frombuffer(b)
>
> # Now, cobj is a ctypes object that is basically a "structure" that can
> be passed # directly to your C-code.
>
> Does this help?

Somewhat. My understanding is that the python-level buffer object is
frowned upon as not good practice, and is scheduled for removal at
some point (Py3K, quite possibly?) Hence, any code that uses buffer()
feels like it "needs" to be replaced by something "more acceptable".
So although I understand the use you suggest, it's not compelling to
me because I am left with the feeling that I wish I knew "the way to
do it that didn't need the buffer object" (even though I realise
intellectually that such a way may not exist).

Paul.

From oliphant.travis at ieee.org  Thu Nov  2 16:59:14 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Thu, 02 Nov 2006 08:59:14 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <45498B88.1000306@v.loewis.de>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>	<eiasv1$s32$1@sea.gmane.org>	<4548FE08.7070402@v.loewis.de>	<eib89a$4p3$1@sea.gmane.org>	<loom.20061102T000820-506@post.gmane.org>	<eibfqq$ppd$1@sea.gmane.org>	<loom.20061102T045218-630@post.gmane.org>	<eibu04$qbm$1@sea.gmane.org>
	<45498B88.1000306@v.loewis.de>
Message-ID: <eid4kj$m21$1@sea.gmane.org>

Martin v. L?wis wrote:
> Travis E. Oliphant schrieb:
> 
>>>2. Should primitive type codes be characters or integers (from an enum) at
>>>C level?
>>>    - I prefer integers
>>
>>>3. Should size be expressed in bits or bytes?
>>>    - I prefer bits
>>>
>>
>>So, you want an integer enum for the "kind" and an integer for the 
>>bitsize?   That's fine with me.
>>
>>One thing I just remembered.  We have T_UBYTE and T_BYTE, etc. defined 
>>in structmember.h already.  Should we just re-use those #defines while 
>>adding to them to make an easy to use interface for primitive types?
> 
> 
> Notice that those type codes imply sizes, namely the platform sizes
> (where "platform" always means "what the C compiler does"). So if
> you want to have platform-independent codes as well, you shouldn't
> use the T_ codes.
> 

In NumPy we've found it convenient to use both.   Basically, we've set 
up a header file that "does the translation" using #defines and typedefs 
to create things like (on a 32-bit platform)

typedef npy_int32  int
#define NPY_INT32 NPY_INT

So, that either the T_code-like enum or the bit-width can be used 
interchangable.

Typically people want to specify bit-widths (and see their data-types in 
bit-widths) but in C-code that implements something you need to use one 
of the platform integers.

I don't know if we really need to bring all of that over.

-Travis


From theller at ctypes.org  Thu Nov  2 21:15:19 2006
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 02 Nov 2006 21:15:19 +0100
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <eiajj5$o9a$1@sea.gmane.org>
References: <eiaipj$l5s$1@sea.gmane.org> <eiajj5$o9a$1@sea.gmane.org>
Message-ID: <454A51D7.90505@ctypes.org>

Travis E. Oliphant schrieb:
> Travis E. Oliphant wrote:
>> Thanks for all the comments that have been given on the data-type 
>> (data-format) PEP.  I'd like opinions on an idea for revising the PEP I 
>> have.
> 
>> 
>> 1) We could define a special string-syntax (or list syntax) that covers 
>> every special case.  The array interface specification goes this 
>> direction and it requires no new Python types.  This could also be seen 
>> as an extension of the "struct" module to allow for nested structures, etc.
>> 
>> 2) We could define a Python object that specifically carries data-format 
>> information.
>> 
>> 
>> Does that explain the goal of what I'm trying to do better?
> 
> In other-words, what I'm saying is I really want a PEP that does this. 
> Could we have a discussion about what the best way to communicate 
> data-format information across multiple extension modules would look 
> like.  I'm not saying my (pre-)PEP is best.  The point of putting it in 
> it's infant state out there is to get the discussion rolling, not to 
> claim I've got all the answers.

IIUC, so far the 'data-object' carries information about the structure
of the data it describes.

Couldn't it go a step further and have also some functionality?
Converting the data into a Python object and back?

This is what the ctypes SETFUNC and GETFUNC functions do, and what
also is implemented in the struct module...

Thomas


From theller at ctypes.org  Thu Nov  2 21:35:32 2006
From: theller at ctypes.org (Thomas Heller)
Date: Thu, 02 Nov 2006 21:35:32 +0100
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <C35C359D-81BB-4023-9738-46B47863E12C@mac.com>
References: <fb6fbf560610300956k4a1b50d2r72b44238bd4336f2@mail.gmail.com>
	<ei5rnm$pib$1@sea.gmane.org> <45468C8E.1000203@canterbury.ac.nz>
	<ei6aos$52s$2@sea.gmane.org> <454789F9.7050808@ctypes.org>
	<C35C359D-81BB-4023-9738-46B47863E12C@mac.com>
Message-ID: <454A5694.1020809@ctypes.org>

Ronald Oussoren schrieb:
> On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote:
> 
>>
>> This mechanism is probably a hack because it'n not possible to add  
>> C accessible
>> fields to type objects, on the other hand it is extensible (in  
>> principle, at least).
> 
> I better start rewriting PyObjC then :-). PyObjC stores some addition  
> information in the type objects that are used to describe Objective-C  
> classes (such as a reference to the proxied class).
> 
> IIRC This has been possible from Python 2.3.

I assume you are referring to the code in pyobjc/Modules/objc/objc-class.h ?

If this really is reliable I should better start rewriting ctypes then ;-).

Hm, I always thought there was some additional magic going on with type
objects, fields appended dynamically at the end or whatever.

Thomas

From ronaldoussoren at mac.com  Thu Nov  2 22:38:08 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Thu, 2 Nov 2006 22:38:08 +0100
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <454A5694.1020809@ctypes.org>
References: <fb6fbf560610300956k4a1b50d2r72b44238bd4336f2@mail.gmail.com>
	<ei5rnm$pib$1@sea.gmane.org> <45468C8E.1000203@canterbury.ac.nz>
	<ei6aos$52s$2@sea.gmane.org> <454789F9.7050808@ctypes.org>
	<C35C359D-81BB-4023-9738-46B47863E12C@mac.com>
	<454A5694.1020809@ctypes.org>
Message-ID: <DC34C2BE-82C7-49E5-84B3-D12A1CD82DD7@mac.com>


On Nov 2, 2006, at 9:35 PM, Thomas Heller wrote:

> Ronald Oussoren schrieb:
>> On Oct 31, 2006, at 6:38 PM, Thomas Heller wrote:
>>
>>>
>>> This mechanism is probably a hack because it'n not possible to add
>>> C accessible
>>> fields to type objects, on the other hand it is extensible (in
>>> principle, at least).
>>
>> I better start rewriting PyObjC then :-). PyObjC stores some addition
>> information in the type objects that are used to describe Objective-C
>> classes (such as a reference to the proxied class).
>>
>> IIRC This has been possible from Python 2.3.
>
> I assume you are referring to the code in pyobjc/Modules/objc/objc- 
> class.h

Yes.

>
> If this really is reliable I should better start rewriting ctypes  
> then ;-).
>
> Hm, I always thought there was some additional magic going on with  
> type
> objects, fields appended dynamically at the end or whatever.

There is such magic, but that magic was updated in Python 2.3 to  
allow type-object extensions like this.

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3562 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061102/fc916cd2/attachment.bin 

From oliphant.travis at ieee.org  Thu Nov  2 23:30:51 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Thu, 02 Nov 2006 15:30:51 -0700
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <454A51D7.90505@ctypes.org>
References: <eiaipj$l5s$1@sea.gmane.org> <eiajj5$o9a$1@sea.gmane.org>
	<454A51D7.90505@ctypes.org>
Message-ID: <eidrir$hve$1@sea.gmane.org>

T
> 
> IIUC, so far the 'data-object' carries information about the structure
> of the data it describes.
> 
> Couldn't it go a step further and have also some functionality?
> Converting the data into a Python object and back?
>

Yes, I had considered it to do that.  That's why the setfunc and getfunc 
functions were written the way they were.

-teo


From greg.ewing at canterbury.ac.nz  Fri Nov  3 00:52:44 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 12:52:44 +1300
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <loom.20061101T204731-521@post.gmane.org>
References: <eiaipj$l5s$1@sea.gmane.org>
	<loom.20061101T204731-521@post.gmane.org>
Message-ID: <454A84CC.9000905@canterbury.ac.nz>

Alexander Belopolsky wrote:
> In ctypes arrays of different
> shapes are represented using different types.  As a result, if the object
> exporting its buffer is resized, the datatype object cannot be reused, it
> has to be replaced.

I was thinking about that myself the other day.
I was thinking that both ctypes and NumPy arrays
+ proposed_type_descriptor provide a way of
describing an array of binary data and providing
Python-level access to that data. So a NumPy
array and an instance of a ctypes type that
happens to describe an array are very similar
things. I was wondering whether they could be
unified somehow.

But then I realised that the ctypes array is
a fixed-size array, whereas NumPy's notion of
an array is rather more flexible. So they're not
really the same thing after all.

However, the *elements* of the array are fixed
size in both cases, so the respective descriptions
of the element type could potentially have something
in common.

My current take on the situation is that Travis is
probably right about ctypes types being too
cumbersome for what he has in mind.

The next best thing would be to make them interoperate:
have an easy way of getting a ctypes type corresponding
to a given data layout description and vice versa.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 00:57:21 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 12:57:21 +1300
Subject: [Python-Dev] PEP: Extending the buffer protocol to share array
 information.
In-Reply-To: <eib4jk$og2$1@sea.gmane.org>
References: <ei6pde$7mn$1@sea.gmane.org> <4547BF86.6070806@v.loewis.de>
	<loom.20061101T195940-807@post.gmane.org> <eib4jk$og2$1@sea.gmane.org>
Message-ID: <454A85E1.90902@canterbury.ac.nz>

Fredrik Lundh wrote:

> the "right solution" for things like this is an *API* that lets you do 
> things like:
> 
>      view = object.acquire_view(region, supported formats)

And how do you describe the "supported formats"?

That's where Travis's proposal comes in, as far
as I can see.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 02:04:07 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 14:04:07 +1300
Subject: [Python-Dev] Path object design
In-Reply-To: <6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com>
Message-ID: <454A9587.6030806@canterbury.ac.nz>

Mike Orr wrote:

>> * This is confusing as heck:
>>   >>> os.path.join("hello", "/world")
>>   '/world'

It's only confusing if you're not thinking of
pathnames as abstract entities.

There's a reason for this behaviour -- it's
so you can do things like

   full_path = os.path.join(default_dir, filename_from_user)

where filename_from_user can be either a relative
or absolute path at his discretion.

In other words, os.path.join doesn't just mean "join
these two paths together", it means "interpret the
second path in the context of the first".

Having said that, I can see there could be an
element of confusion in calling it "join".

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 02:04:13 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 14:04:13 +1300
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <eibfqq$ppd$1@sea.gmane.org>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
	<eiasv1$s32$1@sea.gmane.org> <4548FE08.7070402@v.loewis.de>
	<eib89a$4p3$1@sea.gmane.org> <loom.20061102T000820-506@post.gmane.org>
	<eibfqq$ppd$1@sea.gmane.org>
Message-ID: <454A958D.9070002@canterbury.ac.nz>

Travis Oliphant wrote:
> or just
> 
> numpy.array(array.array('d',[1,2,3]))
> 
> and leave-out the buffer object all together.

I think the buffer object in his example was just a
placeholder for "some arbitrary object that supports
the buffer interface", not necessarily another NumPy
array.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 02:04:19 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 14:04:19 +1300
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <d38f5330611011358x26552268n7cead0c16ef97058@mail.gmail.com>
References: <eiaipj$l5s$1@sea.gmane.org>
	<loom.20061101T204731-521@post.gmane.org>
	<45490DF9.4070500@v.loewis.de>
	<d38f5330611011358x26552268n7cead0c16ef97058@mail.gmail.com>
Message-ID: <454A9593.5000807@canterbury.ac.nz>

Alexander Belopolsky wrote:

> That's a question for Travis, but I would think that they would be
> immutable at the Python level, but mutable at the C level.

Well, anything's mutable at the C level -- the
question is whether you *should* be mutating it.

I think the datatype object should almost certainly
be immutable. Since it's separated from the data
it's describing, it's possible for one datatype
object to describe multiple chunks of data. So
you wouldn't want to mutate one in case it's being
used for something else that you don't know about.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 02:04:23 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 14:04:23 +1300
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
Message-ID: <454A9597.7090102@canterbury.ac.nz>

glyph at divmod.com wrote:

>    >>> os.path.join("hello", "slash/world")
>    'hello/slash/world'
>    >>> os.path.join("hello", "slash//world")
>    'hello/slash//world'
>    Trying to formulate a general rule for what the arguments to 
> os.path.join are supposed to be is really hard.

If you're serious about writing platform-agnostic
pathname code, you don't put slashes in the arguments
at all. Instead you do

   os.path.join("hello", "slash", "world")

Many of the other things you mention are also a
result of not treating pathnames as properly opaque
objects.

If you're saying that the fact they're strings makes
it easy to forget that you're supposed to be treating
them opaquely, there may be merit in that view. It
would be an argument for making path objects a
truly opaque type instead of a subclass of string
or tuple.

>  * although individual operations are atomic, shutil.copytree and 
> friends aren't.  I've often seen python programs confused by 
> partially-copied trees of files.

I can't see how this can be even remotely regarded
as a pathname issue, or even a filesystem interface
issue. It's no different to any other situation
where a piece of code can fall over and leave a
partial result behind. As always, the cure is
defensive coding (clean up a partial result on error,
or be prepared to tolerate the presence of a previous
partial result when re-trying).

It could be argued that shutil.copytree should clean
up after itself if there is an error, but that might
not be what you want -- e.g. you might want to find
out how far it got, and maybe carry on from there
next time. It's probably better to leave things like
that to the caller.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 02:11:54 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 14:11:54 +1300
Subject: [Python-Dev] Path object design
In-Reply-To: <6e9196d20611011836k5e62990pf8851d066ea120b2@mail.gmail.com>
References: <20061101222903.14394.973593042.divmod.xquotient.391@joule.divmod.com>
	<6e9196d20611011836k5e62990pf8851d066ea120b2@mail.gmail.com>
Message-ID: <454A975A.1050708@canterbury.ac.nz>

Mike Orr wrote:
> I have no idea why Microsoft thought it was a good idea to
> put the seven-odd device files in every directory. Why not force
> people to type the colon ("CON:").

Yes, this is a particularly stupid piece of braindamage
on the part of the designers of MS-DOS. As far as I
remember, even CP/M (which was itself a severely
warped and twisted version of RT11) had the good
sense to put colons on the end of such things.

But maybe "design" is too strong a word to apply
to MS-DOS...

Anyhow, I think I agree that there's really nothing
a path library can do about this. Whatever it tries
to do, the fact will remain that it's impossible to
have a regular file called "con", and users will
have to live with that.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 02:16:23 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 14:16:23 +1300
Subject: [Python-Dev] Path object design
In-Reply-To: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com>
References: <20061102031827.14394.636993831.divmod.xquotient.499@joule.divmod.com>
Message-ID: <454A9867.8030807@canterbury.ac.nz>

glyph at divmod.com wrote:
> Relative 
> paths, if they should exist at all, should have to be explicitly linked 
> as relative to something *else* (e.g. made absolute) before they can be 
> used.

If paths were opaque objects, this could be enforced
by not having any way of constructing a path that
wasn't rooted in some existing absolute path.

--
Greg

From greg.ewing at canterbury.ac.nz  Fri Nov  3 02:39:41 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 14:39:41 +1300
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <eibu04$qbm$1@sea.gmane.org>
References: <fb6fbf560611011017u6dbbb166xccdb73ece0a2c07e@mail.gmail.com>
	<eiasv1$s32$1@sea.gmane.org> <4548FE08.7070402@v.loewis.de>
	<eib89a$4p3$1@sea.gmane.org> <loom.20061102T000820-506@post.gmane.org>
	<eibfqq$ppd$1@sea.gmane.org> <loom.20061102T045218-630@post.gmane.org>
	<eibu04$qbm$1@sea.gmane.org>
Message-ID: <454A9DDD.8040000@canterbury.ac.nz>

Travis E. Oliphant wrote:
> We have T_UBYTE and T_BYTE, etc. defined 
> in structmember.h already.  Should we just re-use those #defines while 
> adding to them to make an easy to use interface for primitive types?

They're mixed up with size information, though,
which we don't want to do.

--
Greg

From glyph at divmod.com  Fri Nov  3 02:54:48 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Fri, 03 Nov 2006 01:54:48 -0000
Subject: [Python-Dev] Path object design
Message-ID: <20061103015448.14394.1229016541.divmod.xquotient.630@joule.divmod.com>

On 01:04 am, greg.ewing at canterbury.ac.nz wrote:
>glyph at divmod.com wrote:

>If you're serious about writing platform-agnostic
>pathname code, you don't put slashes in the arguments
>at all. Instead you do
>
>   os.path.join("hello", "slash", "world")
>
>Many of the other things you mention are also a
>result of not treating pathnames as properly opaque
>objects.

Of course nobody who cares about these issues is going to put constant forward slashes into pathnames.  The point is not that you'll forget you're supposed to be dealing with pathnames; the point is that you're going to get input from some source that you've got very little control over, and *especially* if that source is untrusted (although sometimes just due to mistakes) there are all kinds of ways it can trip you up.  Did you accidentally pass it through something that doubles or undoubles all backslashes, etc.  Sometimes these will result in harmless errors anyway, sometimes it's a critical error that will end up trying to delete /usr instead of /home/user/installer-build/ROOT/usr.  If you have the path library catching these problems for you then a far greater percentage fall into the former category.

>If you're saying that the fact they're strings makes
>it easy to forget that you're supposed to be treating
>them opaquely,

That's exactly what I'm saying.

>>  * although individual operations are atomic, shutil.copytree and friends 
>>aren't.  I've often seen python programs confused by partially-copied trees 
>>of files.

>I can't see how this can be even remotely regarded
>as a pathname issue, or even a filesystem interface
>issue. It's no different to any other situation
>where a piece of code can fall over and leave a
>partial result behind.

It is a bit of a stretch, I'll admit, but I included it because it is a weakness of the path library that it is difficult to do the kind of parallel iteration required to implement tree-copying yourself.  If that were trivial, then you could write your own file-copying loop and cope with errors yourself.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061103/e2c92bb2/attachment.html 

From alexander.belopolsky at gmail.com  Fri Nov  3 02:55:41 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Thu, 2 Nov 2006 20:55:41 -0500
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <454A9593.5000807@canterbury.ac.nz>
References: <eiaipj$l5s$1@sea.gmane.org>
	<loom.20061101T204731-521@post.gmane.org>
	<45490DF9.4070500@v.loewis.de>
	<d38f5330611011358x26552268n7cead0c16ef97058@mail.gmail.com>
	<454A9593.5000807@canterbury.ac.nz>
Message-ID: <3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com>


On Nov 2, 2006, at 8:04 PM, Greg Ewing wrote:

>
> I think the datatype object should almost certainly
> be immutable. Since it's separated from the data
> it's describing, it's possible for one datatype
> object to describe multiple chunks of data. So
> you wouldn't want to mutate one in case it's being
> used for something else that you don't know about.


I only mentioned that the datatype object would be mutable at C level  
because changing the object instead of deleting and creating a new  
one could be a valid optimization in situations where the object is  
know not to be shared.

My main concern was that in ctypes the size of an array is a part of  
the datatype object and this seems to be redundant if used for the  
buffer protocol.  Buffer protocol already reports the size of the  
buffer as a return value of bf_get*buffer methods.

In another post, Greg Ewing wrote:

 > > numpy.array(array.array('d',[1,2,3]))
 > >
 > > and leave-out the buffer object all together.

 > I think the buffer object in his example was just a
 > placeholder for "some arbitrary object that supports
 > the buffer interface", not necessarily another NumPy
 > array.

Yes, thanks. In fact numpy.array(array.array('d',[1,2,3])) already  
works in numpy (I think because numpy knows about the standard  
library array type).  In my example, I wanted to use an object that  
supports buffer protocol and little else.


From greg.ewing at canterbury.ac.nz  Fri Nov  3 03:25:15 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 15:25:15 +1300
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com>
References: <eiaipj$l5s$1@sea.gmane.org>
	<loom.20061101T204731-521@post.gmane.org>
	<45490DF9.4070500@v.loewis.de>
	<d38f5330611011358x26552268n7cead0c16ef97058@mail.gmail.com>
	<454A9593.5000807@canterbury.ac.nz>
	<3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com>
Message-ID: <454AA88B.2070900@canterbury.ac.nz>

Alexander Belopolsky wrote:

> My main concern was that in ctypes the size of an array is a part of  
> the datatype object and this seems to be redundant if used for the  
> buffer protocol.  Buffer protocol already reports the size of the  
> buffer as a return value of bf_get*buffer methods.

I think what would happen if you were interoperating with
ctypes is that you would get a datatype describing one
element of the array, together with the shape information,
and construct a ctypes array type from that. And going
the other way, from a ctypes array type you would extract
an element datatype and a shape.

--
Greg

From steve at holdenweb.com  Fri Nov  3 03:55:28 2006
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 03 Nov 2006 02:55:28 +0000
Subject: [Python-Dev] Path object design
In-Reply-To: <454A9587.6030806@canterbury.ac.nz>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com>
	<454A9587.6030806@canterbury.ac.nz>
Message-ID: <eieb0q$put$1@sea.gmane.org>

Greg Ewing wrote:
> Mike Orr wrote:
> 
> 
>>>* This is confusing as heck:
>>>  >>> os.path.join("hello", "/world")
>>>  '/world'
> 
> 
> It's only confusing if you're not thinking of
> pathnames as abstract entities.
> 
> There's a reason for this behaviour -- it's
> so you can do things like
> 
>    full_path = os.path.join(default_dir, filename_from_user)
> 
> where filename_from_user can be either a relative
> or absolute path at his discretion.
> 
> In other words, os.path.join doesn't just mean "join
> these two paths together", it means "interpret the
> second path in the context of the first".
> 
> Having said that, I can see there could be an
> element of confusion in calling it "join".
> 
Good point. "relativise" might be appropriate, though something shorter 
would be better.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden


From alexander.belopolsky at gmail.com  Fri Nov  3 04:20:22 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Fri, 3 Nov 2006 03:20:22 +0000 (UTC)
Subject: [Python-Dev] PEP: Adding data-type objects to Python
References: <ehton4$t7n$1@sea.gmane.org>
	<20061028135415.GA13049@code0.codespeak.net>
	<ei6ak5$52s$1@sea.gmane.org>
	<79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
	<45494519.4020501@ieee.org>
	<79990c6b0611020053w6a11c424yfc6d329ab48e4a90@mail.gmail.com>
Message-ID: <loom.20061103T031248-922@post.gmane.org>

Paul Moore <p.f.moore <at> gmail.com> writes:

> Somewhat. My understanding is that the python-level buffer object is
> frowned upon as not good practice, and is scheduled for removal at
> some point (Py3K, quite possibly?) Hence, any code that uses buffer()
> feels like it "needs" to be replaced by something "more acceptable".

Python 2.x buffer object serves two distinct purposes.  First, it is a
"mutable string" object and this is definitely not going away being
replaced by the bytes object. (Interestingly, this functionality is not
exposed to python, but C extension modules can call
PyBuffer_New(size) to create a buffer.)  Second, it is a "view" into any
object supporting buffer protocol.  For a while this usage was indeed
frowned upon because buffer objects held the pointer obtained from
bf_get*buffer for too long causing memory errors in situations like
this:

>>> a = array('c', "x"*10)
>>> b = buffer(a, 5, 2)
>>> a.extend('x'*1000)
>>> str(b)
'xx'

This problem was fixed more than two years ago. 

------
r35400 | nascheme | 2004-03-10 

Make buffer objects based on mutable objects (like array) safe.
------

Even though it was suggested in the past that buffer *object*
should be deprecated as unsafe, I don't remember seeing a call
to deprecate the buffer protocol.   


> So although I understand the use you suggest, it's not compelling to
> me because I am left with the feeling that I wish I knew "the way to
> do it that didn't need the buffer object" (even though I realise
> intellectually that such a way may not exist).
> 

As I explained in another post,  I used buffer object as an example of
an object that supports buffer protocol, but does not export type
information in the form usable by numpy.

Here is another way to illustrate the problem:

>>> a = numpy.array(array.array('H', [1,2,3]))
>>> b = numpy.array([1,2,3],dtype='H')
>>> a.dtype == b.dtype
False

With the extended buffer protocol it will be possible for numpy.array(..)
to realize that array.array('H', [1,2,3]) is a sequence of unsigned short
integers and convert it accordingly.  Currently numpy has to go through
the sequence protocol to create a numpy.array from an array.array and
loose the type information.


From alexander.belopolsky at gmail.com  Fri Nov  3 04:36:59 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Thu, 2 Nov 2006 22:36:59 -0500
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <454AA88B.2070900@canterbury.ac.nz>
References: <eiaipj$l5s$1@sea.gmane.org>
	<loom.20061101T204731-521@post.gmane.org>
	<45490DF9.4070500@v.loewis.de>
	<d38f5330611011358x26552268n7cead0c16ef97058@mail.gmail.com>
	<454A9593.5000807@canterbury.ac.nz>
	<3FCF9851-D7A5-4110-BBD4-94EA07AA1C83@gmail.com>
	<454AA88B.2070900@canterbury.ac.nz>
Message-ID: <D51F97F6-D817-4003-B553-A5A75BCCE794@gmail.com>


On Nov 2, 2006, at 9:25 PM, Greg Ewing wrote:

>
> I think what would happen if you were interoperating with
> ctypes is that you would get a datatype describing one
> element of the array, together with the shape information,
> and construct a ctypes array type from that. And going
> the other way, from a ctypes array type you would extract
> an element datatype and a shape.

Correct, assuming Travis' approach is accepted.  However I understood  
that Martin was suggesting that ctypes types should be used to  
describe the structure of the buffer.  Thus a buffer containing 10  
integers would report its datatype as c_int*10.

I was probably mistaken and Martin was suggesting the same as you.   
In this case extended buffer protocol would still use a different  
model from ctype and "don't reinvent the wheel" argument goes away.

From greg.ewing at canterbury.ac.nz  Fri Nov  3 07:31:57 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Fri, 03 Nov 2006 19:31:57 +1300
Subject: [Python-Dev] Path object design
In-Reply-To: <eieb0q$put$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com>
	<454A9587.6030806@canterbury.ac.nz> <eieb0q$put$1@sea.gmane.org>
Message-ID: <454AE25D.9090507@canterbury.ac.nz>

Steve Holden wrote:
> Greg Ewing wrote:
> 
>>Having said that, I can see there could be an
>>element of confusion in calling it "join".
>>
> 
> Good point. "relativise" might be appropriate,

Sounds like something to make my computer go at
warp speed, which would be nice, but I won't
be expecting a patch any time soon. :-)

--
Greg

From talin at acm.org  Fri Nov  3 07:35:11 2006
From: talin at acm.org (Talin)
Date: Thu, 02 Nov 2006 22:35:11 -0800
Subject: [Python-Dev] Path object design
In-Reply-To: <eieb0q$put$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<6e9196d20611011746w38f104eerc55d61cf1e1ac3c6@mail.gmail.com>	<454A9587.6030806@canterbury.ac.nz>
	<eieb0q$put$1@sea.gmane.org>
Message-ID: <454AE31F.1050300@acm.org>

Steve Holden wrote:
> Greg Ewing wrote:
>> Mike Orr wrote:
>> Having said that, I can see there could be an
>> element of confusion in calling it "join".
>>
> Good point. "relativise" might be appropriate, though something shorter 
> would be better.
> 
> regards
>   Steve

The term used in many languages for this sort of operation is "combine". 
(See .Net System.IO.Path for an example.) I kind of like the term - it 
implies that you are mixing two paths together, but it doesn't imply 
that the combination will be additive.

- Talin

From dalke at dalkescientific.com  Fri Nov  3 17:58:54 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Fri, 3 Nov 2006 17:58:54 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
Message-ID: <d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>

glyph:
> Path manipulation:
>
>  * This is confusing as heck:
>    >>> os.path.join("hello", "/world")
>    '/world'
>    >>> os.path.join("hello", "slash/world")
>    'hello/slash/world'
>    >>> os.path.join("hello", "slash//world")
>    'hello/slash//world'
>    Trying to formulate a general rule for what the arguments to os.path.join
> are supposed to be is really hard.  I can't really figure out what it would
> be like on a non-POSIX/non-win32 platform.

Made trickier by the similar yet different behaviour of urlparse.urljoin.

 >>> import urlparse
 >>> urlparse.urljoin("hello", "/world")
 '/world'
 >>> urlparse.urljoin("hello", "slash/world")
 'slash/world'
 >>> urlparse.urljoin("hello", "slash//world")
 'slash//world'
 >>>

It does not make sense to me that these should be different.

                                               Andrew
                                               dalke at dalkescientific.com

[Apologies to glyph for the dup; mixed up the reply-to.  Still getting
used to gmail.]

From steve at holdenweb.com  Fri Nov  3 19:38:21 2006
From: steve at holdenweb.com (Steve Holden)
Date: Fri, 03 Nov 2006 18:38:21 +0000
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
Message-ID: <eig28m$8tu$1@sea.gmane.org>

Andrew Dalke wrote:
> glyph:
> 
>>Path manipulation:
>>
>> * This is confusing as heck:
>>   >>> os.path.join("hello", "/world")
>>   '/world'
>>   >>> os.path.join("hello", "slash/world")
>>   'hello/slash/world'
>>   >>> os.path.join("hello", "slash//world")
>>   'hello/slash//world'
>>   Trying to formulate a general rule for what the arguments to os.path.join
>>are supposed to be is really hard.  I can't really figure out what it would
>>be like on a non-POSIX/non-win32 platform.
> 
> 
> Made trickier by the similar yet different behaviour of urlparse.urljoin.
> 
>  >>> import urlparse
>  >>> urlparse.urljoin("hello", "/world")
>  '/world'
>  >>> urlparse.urljoin("hello", "slash/world")
>  'slash/world'
>  >>> urlparse.urljoin("hello", "slash//world")
>  'slash//world'
>  >>>
> 
> It does not make sense to me that these should be different.
> 
Although the last two smell like bugs, the point of urljoin is to make 
an absolute URL from an absolute ("current page") URL and a relative 
(link) one. As we see:

  >>> urljoin("/hello", "slash/world")
'/slash/world'

and

  >>> urljoin("http://localhost/hello", "slash/world")
'http://localhost/slash/world'

but

  >>> urljoin("http://localhost/hello/", "slash/world")
'http://localhost/hello/slash/world'
  >>> urljoin("http://localhost/hello/index.html", "slash/world")
'http://localhost/hello/slash/world'
  >>>

I think we can probably conclude that this is what's supposed to happen. 
In the case of urljoin the first argument is interpreted as referencing 
an existing resource and the second as a link such as might appear in 
that resource.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden


From fredrik at pythonware.com  Fri Nov  3 20:04:40 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 03 Nov 2006 20:04:40 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <eig28m$8tu$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<eig28m$8tu$1@sea.gmane.org>
Message-ID: <eig3sh$ec9$1@sea.gmane.org>

Steve Holden wrote:

> Although the last two smell like bugs, the point of urljoin is to make 
> an absolute URL from an absolute ("current page") URL

also known as a base URL:

     http://www.w3.org/TR/html4/struct/links.html#h-12.4.1

(os.path.join's behaviour is also well-defined, btw; if any component is 
an absolute path, all preceding components are ignored.)

</F>


From martin at v.loewis.de  Sat Nov  4 00:32:57 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Nov 2006 00:32:57 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
Message-ID: <454BD1A9.8080508@v.loewis.de>

Andrew Dalke schrieb:
>  >>> import urlparse
>  >>> urlparse.urljoin("hello", "/world")
>  '/world'
>  >>> urlparse.urljoin("hello", "slash/world")
>  'slash/world'
>  >>> urlparse.urljoin("hello", "slash//world")
>  'slash//world'
>  >>>
> 
> It does not make sense to me that these should be different.

Just in case this isn't clear from Steve's and Fredrik's
post: The behaviour of this function is (or should be)
specified, by an IETF RFC. If somebody finds that non-intuitive,
that's likely because their mental model of relative URIs
deviate's from the RFC's model.

Of course, there is also the chance that the implementation
deviates from the RFC; that would be a bug.

Regards,
Martin

From scott+python-dev at scottdial.com  Sat Nov  4 01:07:35 2006
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Fri, 03 Nov 2006 19:07:35 -0500
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <45494519.4020501@ieee.org>
References: <ehton4$t7n$1@sea.gmane.org>	<20061028135415.GA13049@code0.codespeak.net>	<ei6ak5$52s$1@sea.gmane.org>	<79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
	<45494519.4020501@ieee.org>
Message-ID: <454BD9C7.9050001@scottdial.com>

Travis Oliphant wrote:
> Paul Moore wrote:
>> Enough of the abstract. As a concrete example, suppose I have a (byte)
>> string in my program containing some binary data - an ID3 header, or a
>> TCP packet, or whatever. It doesn't really matter. Does your proposal
>> offer anything to me in how I might manipulate that data (assuming I'm
>> not using NumPy)? (I'm not insisting that it should, I'm just trying
>> to understand the scope of the PEP).
>>
> 
> What do you mean by "manipulate the data."  The proposal for a 
> data-format object would help you describe that data in a standard way 
> and therefore share that data between several library that would be able 
> to understand the data (because they all use and/or understand the 
> default Python way to handle data-formats).
> 

Perhaps the most relevant thing to pull from this conversation is back 
to what Martin has asked about before: "flexible array members". A TCP 
packet has no defined length (there isn't even a header field in the 
packet for this, so in fairness we can talk about IP packets which do). 
There is no way for me to describe this with the pre-PEP data-formats.

I feel like it is misleading of you to say "it's up to the package to do 
manipulations," because you glanced over the fact that you can't even 
describe this type of data. ISTM, that you're only interested in 
describing repetitious fixed-structure arrays. If we are going to have a 
"default Python way to handle data-formats", then don't you feel like 
this falls short of the mark?

I fear that you speak about this in too grandiose terms and are now 
trapped by people asking, "well, can I do this?" I think for a lot of 
folks the answer is: "nope." With respect to the network packets, this 
PEP doesn't do anything to fix the communication barrier. Is this not in 
the scope of "a consistent and standard way to discuss the format of 
binary data" (which is what your PEP's abstract sets out as the task)?

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From dalke at dalkescientific.com  Sat Nov  4 01:56:39 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sat, 4 Nov 2006 01:56:39 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <454BD1A9.8080508@v.loewis.de>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<454BD1A9.8080508@v.loewis.de>
Message-ID: <d78db4cd0611031656l7bd18445q24a3473150f6f672@mail.gmail.com>

Martin:
> Just in case this isn't clear from Steve's and Fredrik's
> post: The behaviour of this function is (or should be)
> specified, by an IETF RFC. If somebody finds that non-intuitive,
> that's likely because their mental model of relative URIs
> deviate's from the RFC's model.

While I didn't realize that urljoin is only supposed to be used
with a base URL, where "base URL" (used in the docstring) has
a specific requirement that it be absolute.

I instead saw the word "join" and figured it's should do roughly
the same things as os.path.join.


>>> import urlparse
>>> urlparse.urljoin("file:///path/to/hello", "slash/world")
'file:///path/to/slash/world'
>>> urlparse.urljoin("file:///path/to/hello", "/slash/world")
'file:///slash/world'
>>> import os
>>> os.path.join("/path/to/hello", "slash/world")
'/path/to/hello/slash/world'
>>>

It does not.  My intuition, nowadays highly influenced by URLs, is that
with a couple of hypothetical functions for going between filenames and URLs:

os.path.join(absolute_filename, filename)
   ==
file_url_to_filename(urlparse.urljoin(
     filename_to_file_url(absolute_filename),
     filename_to_file_url(filename)))

which is not the case.  os.join assumes the base is a directory
name when used in a join: "inserting '/' as needed" while RFC
1808 says

           The last segment of the base URL's path (anything
           following the rightmost slash "/", or the entire path if no
           slash is present) is removed

Is my intuition wrong in thinking those should be the same?

I suspect it is. I've been very glad that when I ask for a directory
name that I don't need to check that it ends with a "/".  Urljoin's
behaviour is correct for what it's doing.  os.path.join is better for
what it's doing.  (And about once a year I manually verify the
difference because I get unsure.)

I think these should not share the "join" in the name.

If urljoin is not meant for relative base URLs, should it
raise an exception when misused? Hmm, though the RFC
algorithm does not have a failure mode and the result may
be a relative URL.

Consider

>>> urlparse.urljoin("http://blah.com/a/b/c", "..")
'http://blah.com/a/'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../")
'http://blah.com/a/'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../..")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../../")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../../..")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..")  # What?!
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../../")
'http://blah.com/../../'
>>>


> Of course, there is also the chance that the implementation
> deviates from the RFC; that would be a bug.

The comment in urlparse

    # XXX The stuff below is bogus in various ways...

is ever so reassuring.  I suspect there's a bug given the
previous code.  Or I've a bad mental model.  ;)

                        Andrew
                        dalke at dalkescientific.com

From oliphant.travis at ieee.org  Sat Nov  4 02:44:19 2006
From: oliphant.travis at ieee.org (Travis Oliphant)
Date: Fri, 03 Nov 2006 18:44:19 -0700
Subject: [Python-Dev] PEP: Adding data-type objects to Python
In-Reply-To: <454BD9C7.9050001@scottdial.com>
References: <ehton4$t7n$1@sea.gmane.org>	<20061028135415.GA13049@code0.codespeak.net>	<ei6ak5$52s$1@sea.gmane.org>	<79990c6b0610310147q74851b19v55e7caab6f87c444@mail.gmail.com>
	<45494519.4020501@ieee.org> <454BD9C7.9050001@scottdial.com>
Message-ID: <454BF073.2050402@ieee.org>


>
> Perhaps the most relevant thing to pull from this conversation is back 
> to what Martin has asked about before: "flexible array members". A TCP 
> packet has no defined length (there isn't even a header field in the 
> packet for this, so in fairness we can talk about IP packets which 
> do). There is no way for me to describe this with the pre-PEP 
> data-formats.
>
> I feel like it is misleading of you to say "it's up to the package to 
> do manipulations," because you glanced over the fact that you can't 
> even describe this type of data. ISTM, that you're only interested in 
> describing repetitious fixed-structure arrays. 
Yes, that's right.  I'm only interested in describing binary data with a 
fixed length.  Others can help push it farther than that (if they even 
care).

> If we are going to have a "default Python way to handle data-formats", 
> then don't you feel like this falls short of the mark?
Not for me.   We can fix what needs fixing, but not if we can't get out 
of the gate.
>
> I fear that you speak about this in too grandiose terms and are now 
> trapped by people asking, "well, can I do this?" I think for a lot of 
> folks the answer is: "nope." With respect to the network packets, this 
> PEP doesn't do anything to fix the communication barrier.

Yes it could if you were interested in pushing it there.   No, I didn't 
solve that particular problem with the PEP (because I can only solve the 
problems I'm aware of), but I do think the problem could be solved.   We 
have far too many nay-sayers on this list, I think.

Right now, I don't have time to push this further.  My real interest is 
the extended buffer protocol.  I want something that works for that.  
When I do have time again to discuss it again, I might come back and 
push some more. 

But, not now.

-Travis


From pje at telecommunity.com  Sat Nov  4 03:09:47 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Fri, 03 Nov 2006 21:09:47 -0500
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611031656l7bd18445q24a3473150f6f672@mail.gmail.co
 m>
References: <454BD1A9.8080508@v.loewis.de>
	<20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<454BD1A9.8080508@v.loewis.de>
Message-ID: <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>

At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote:
>os.join assumes the base is a directory
>name when used in a join: "inserting '/' as needed" while RFC
>1808 says
>
>            The last segment of the base URL's path (anything
>            following the rightmost slash "/", or the entire path if no
>            slash is present) is removed
>
>Is my intuition wrong in thinking those should be the same?

Yes.  :)

Path combining and URL absolutization(?) are inherently different 
operations with only superficial similarities.  One reason for this is that 
a trailing / on a URL has an actual meaning, whereas in filesystem paths a 
trailing / is an aberration and likely an actual error.

The path combining operation says, "treat the following as a subpath of the 
base path, unless it is absolute".  The URL normalization operation says, 
"treat the following as a subpath of the location the base URL is 
*contained in*".

Because of this, os.path.join assumes a path with a trailing separator is 
equivalent to a path without one, since that is the only reasonable way to 
interpret treating the joined path as a subpath of the base path.

But for a URL join, the path /foo and the path /foo/ are not only 
*different paths* referring to distinct objects, but the operation wants to 
refer to the *container* of the referenced object.  /foo might refer to a 
directory, while /foo/ refers to some default content (e.g. 
index.html).  This is actually why Apache normally redirects you from /foo 
to /foo/ before it serves up the index.html; relative URLs based on a base 
URL of /foo won't work right.

The URL approach is designed to make peer-to-peer linking in a given 
directory convenient.  Instead of referring to './foo.html' (as one would 
have to do with filenames, you can simply refer to 'foo.html'.  But the 
cost of saving those characters in every link is that joining always takes 
place on the parent, never the tail-end.  Thus directory URLs normally end 
in a trailing /, and most tools tend to automatically redirect when 
somebody leaves it off.  (Because otherwise the links would be wrong.)


From steve at holdenweb.com  Sat Nov  4 05:34:12 2006
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 04 Nov 2006 04:34:12 +0000
Subject: [Python-Dev] Path object design
In-Reply-To: <5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
References: <454BD1A9.8080508@v.loewis.de>	<20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>	<454BD1A9.8080508@v.loewis.de>
	<d78db4cd0611031656l7bd18445q24a3473150f6f672@mail.gmail.co m>
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
Message-ID: <eih55s$7u0$1@sea.gmane.org>

Phillip J. Eby wrote:
> At 01:56 AM 11/4/2006 +0100, Andrew Dalke wrote:
> 
>>os.join assumes the base is a directory
>>name when used in a join: "inserting '/' as needed" while RFC
>>1808 says
>>
>>           The last segment of the base URL's path (anything
>>           following the rightmost slash "/", or the entire path if no
>>           slash is present) is removed
>>
>>Is my intuition wrong in thinking those should be the same?
> 
> 
> Yes.  :)
> 
> Path combining and URL absolutization(?) are inherently different 
> operations with only superficial similarities.  One reason for this is that 
> a trailing / on a URL has an actual meaning, whereas in filesystem paths a 
> trailing / is an aberration and likely an actual error.
> 
> The path combining operation says, "treat the following as a subpath of the 
> base path, unless it is absolute".  The URL normalization operation says, 
> "treat the following as a subpath of the location the base URL is 
> *contained in*".
> 
> Because of this, os.path.join assumes a path with a trailing separator is 
> equivalent to a path without one, since that is the only reasonable way to 
> interpret treating the joined path as a subpath of the base path.
> 
> But for a URL join, the path /foo and the path /foo/ are not only 
> *different paths* referring to distinct objects, but the operation wants to 
> refer to the *container* of the referenced object.  /foo might refer to a 
> directory, while /foo/ refers to some default content (e.g. 
> index.html).  This is actually why Apache normally redirects you from /foo 
> to /foo/ before it serves up the index.html; relative URLs based on a base 
> URL of /foo won't work right.
> 
> The URL approach is designed to make peer-to-peer linking in a given 
> directory convenient.  Instead of referring to './foo.html' (as one would 
> have to do with filenames, you can simply refer to 'foo.html'.  But the 
> cost of saving those characters in every link is that joining always takes 
> place on the parent, never the tail-end.  Thus directory URLs normally end 
> in a trailing /, and most tools tend to automatically redirect when 
> somebody leaves it off.  (Because otherwise the links would be wrong.)
> 
Having said this, Andrew *did* demonstrate quite convincingly that the 
current urljoin has some fairly egregious directory traversal glitches. 
Is it really right to punt obvious gotchas like

 >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../")

'http://blah.com/../../'

 >>>

to the server?

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden


From ncoghlan at gmail.com  Sat Nov  4 05:38:53 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sat, 04 Nov 2006 14:38:53 +1000
Subject: [Python-Dev] Path object design
In-Reply-To: <eih55s$7u0$1@sea.gmane.org>
References: <454BD1A9.8080508@v.loewis.de>	<20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>	<454BD1A9.8080508@v.loewis.de>	<d78db4cd0611031656l7bd18445q24a3473150f6f672@mail.gmail.co
	m>	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
	<eih55s$7u0$1@sea.gmane.org>
Message-ID: <454C195D.2060901@gmail.com>

Steve Holden wrote:
> Having said this, Andrew *did* demonstrate quite convincingly that the 
> current urljoin has some fairly egregious directory traversal glitches. 
> Is it really right to punt obvious gotchas like
> 
>  >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../")
> 
> 'http://blah.com/../../'
> 
>  >>>
> 
> to the server?

See Paul Jimenez's thread about replacing urlparse with something better. The 
current module has some serious issues :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From seefeld at sympatico.ca  Wed Nov  1 20:14:05 2006
From: seefeld at sympatico.ca (Stefan Seefeld)
Date: Wed, 01 Nov 2006 14:14:05 -0500
Subject: [Python-Dev] [Tracker-discuss] Getting Started
In-Reply-To: <bbaeab100611011105o5cf20956vcbd19fd48a2ec68b@mail.gmail.com>
References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca>	
	<50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com>	
	<200611011532.42802.forsberg@efod.se>
	<4548B473.8020605@sympatico.ca>
	<bbaeab100611011105o5cf20956vcbd19fd48a2ec68b@mail.gmail.com>
Message-ID: <4548F1FD.5010505@sympatico.ca>

Brett Cannon wrote:
> On 11/1/06, Stefan Seefeld <seefeld at sympatico.ca> wrote:

>> Right. Brett, do we need accounts on python.org for this ?
> 
> 
> Yep.  It just requires SSH 2 keys from each of you.  You can then email
> python-dev with those keys and your first.last name and someone there will
> install the keys for you.

My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan Seefeld.

Thanks !

		Stefan

-- 

      ...ich hab' noch einen Koffer in Berlin...

From forsberg at efod.se  Wed Nov  1 20:25:03 2006
From: forsberg at efod.se (Erik Forsberg)
Date: Wed, 01 Nov 2006 20:25:03 +0100
Subject: [Python-Dev] [Tracker-discuss] Getting Started
In-Reply-To: <bbaeab100611011117j3b898bd7ocd01bfa12c7f7846@mail.gmail.com>
	(Brett Cannon's message of "Wed, 1 Nov 2006 11:17:56 -0800")
References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca>
	<50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com>
	<200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca>
	<bbaeab100611011105o5cf20956vcbd19fd48a2ec68b@mail.gmail.com>
	<4548F1FD.5010505@sympatico.ca>
	<bbaeab100611011117j3b898bd7ocd01bfa12c7f7846@mail.gmail.com>
Message-ID: <87slh3vuk0.fsf@uterus.efod.se>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"Brett Cannon" <brett at python.org> writes:

> On 11/1/06, Stefan Seefeld <seefeld at sympatico.ca> wrote:
>>
>> Brett Cannon wrote:
>> > On 11/1/06, Stefan Seefeld <seefeld at sympatico.ca> wrote:
>>
>> >> Right. Brett, do we need accounts on python.org for this ?
>> >
>> >
>> > Yep.  It just requires SSH 2 keys from each of you.  You can then email
>> > python-dev with those keys and your first.last name and someone there
>> will
>> > install the keys for you.
>>
>> My key is at http://www3.sympatico.ca/seefeld/ssh.txt, I'm Stefan Seefeld.
>>
>> Thanks !
>
>
> Just to clarify, this is not for pydotorg but the svn.python.org.  The
> admins for our future Roundup instance are going to keep their Roundup code
> in svn so they need commit access.

Now when that's clarified, here's my data:

Public SSH key: http://efod.se/about/ptkey.pub
First.Lastname: erik.forsberg

I'd appreciate if someone with good taste could tell us where in the
tree we should add our code :-).

Thanks,
\EF
- -- 
Erik Forsberg                 http://efod.se
GPG/PGP Key: 1024D/0BAC89D9
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Processed by Mailcrypt 3.5.8+ <http://mailcrypt.sourceforge.net/>

iD8DBQFFSPSOrJurFAusidkRAucqAKDWdlq6dkI1nNt5caSyJ+gFviSeJACg4gNJ
ItRUEsEI3/4ZN154Znw4jEQ=
=o+Iy
-----END PGP SIGNATURE-----

From oliphant at ee.byu.edu  Wed Nov  1 22:41:38 2006
From: oliphant at ee.byu.edu (Travis Oliphant)
Date: Wed, 01 Nov 2006 14:41:38 -0700
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <45490989.9010603@v.loewis.de>
References: <eiaipj$l5s$1@sea.gmane.org> <4548DDFD.5030604@v.loewis.de>
	<eiaovn$dar$1@sea.gmane.org> <4548FA58.4050702@v.loewis.de>
	<4549010F.6090200@ieee.org> <45490989.9010603@v.loewis.de>
Message-ID: <45491492.9060208@ee.byu.edu>

Martin v. L?wis wrote:

>Travis Oliphant schrieb:
>  
>
>>>>r_field = PyDict_GetItemString(dtype,'r');
>>>>    
>>>>        
>>>>
>>Actually it should read PyDict_GetItemString(dtype->fields).    The
>>r_field is a tuple (data-type object, offset).  The fields attribute is
>>(currently) a Python dictionary.
>>    
>>
>
>Ok. This seems to be missing in the PEP. 
>
Yeah,  actually quite a bit is missing.  Because I wanted to float the 
idea for discussion before "getting the details perfect"  (which of 
course they wouldn't be if it was just my input producing them).

>In this code, where is PyArray_GetField coming from?
>
This is a NumPy Specific C-API.    That's why I was confused about why 
you wanted me to show how I would do it. 

But, what you are actually asking is how would another application use 
the data-type information to do the same thing using the data-type 
object and a pointer to memory.  Is that correct?

This is a reasonable thing to request.  And your example is a good one.  
I will use the PEP to explain it.

Ultimately, the code you are asking for will have to have some kind of 
dispatch table for different binary code depending on the actual 
data-types being shared (unless all that is needed is a copy in which 
case just the size of the element area can be used).  In my experience, 
the dispatch table must be present for at least the "simple" 
data-types.  The data-types built up from there can depend on those.

In NumPy, the data-type objects have function pointers to accomplish all 
the things NumPy does quickly.  So, each data-type object in NumPy 
points to a function-pointer table and the NumPy code defers to it to 
actually accomplish the task (much like Python really).

Not all libraries will support working with all data-types.  If they 
don't support it, they just raise an error indicating that it's not 
possible to share that kind of data. 

> What does
>it do? If I wanted to write this code from scratch, what
>should I write instead? Since this is all about a flat
>memory block, I'm surprised I need "true" Python objects
>for the field values in there.
>  
>
Well, actually, the block could be "strided" as well. 

So, you would write something that gets the pointer to the memory and 
then gets the extended information (dimensionality, shape, and strides, 
and data-format object).  Then, you would get the offset of the field 
you are interested in from the start of the element (it's stored in the 
data-format representation).

Then do a memory copy from the right place (using the array iterator in 
NumPy you can actually do it without getting the shape and strides 
information first but I'm holding off on that PEP until an N-d array is 
proposed for Python).   I'll write something like that as an example and 
put it in the PEP for the extended buffer protocol.  

-Travis


>  
>
>>But, the other option (especially for code already written) would be to
>>just convert the data-format specification into it's own internal
>>representation.
>>    
>>
>
>Ok, so your assumption is that consumers already have their own
>machinery, in which case ease-of-use would be the question how
>difficult it is to convert datatype objects into the internal
>representation.
>
>Regards,
>Martin
>  
>


From micktwomey at gmail.com  Thu Nov  2 12:09:05 2006
From: micktwomey at gmail.com (Michael Twomey)
Date: Thu, 2 Nov 2006 11:09:05 +0000
Subject: [Python-Dev] [Tracker-discuss] Getting Started
In-Reply-To: <bbaeab100611011105o5cf20956vcbd19fd48a2ec68b@mail.gmail.com>
References: <87odrv6k2y.fsf@uterus.efod.se> <45454854.2080402@sympatico.ca>
	<50a522ca0611010610uf598b0elc3142b9af9de5a43@mail.gmail.com>
	<200611011532.42802.forsberg@efod.se> <4548B473.8020605@sympatico.ca>
	<bbaeab100611011105o5cf20956vcbd19fd48a2ec68b@mail.gmail.com>
Message-ID: <50a522ca0611020309i5be21d99t8c39bbeb323289ed@mail.gmail.com>

On 11/1/06, Brett Cannon <brett at python.org> wrote:
> >
> > Right. Brett, do we need accounts on python.org for this ?
>
> Yep.  It just requires SSH 2 keys from each of you.  You can then email
> python-dev with those keys and your first.last name and someone there will
> install the keys for you.
>

I'll need svn access to svn.python.org too for the roundup tracker.

My key is over at http://translucentcode.org/mick/ssh_key.txt
firstname.lastname: michael.twomey

cheers,
  Michael

From kxroberto at googlemail.com  Fri Nov  3 11:50:05 2006
From: kxroberto at googlemail.com (Robert)
Date: Fri, 03 Nov 2006 11:50:05 +0100
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate
	GIL (branch)
Message-ID: <454B1EDD.9050908@googlemail.com>

repeated from c.l.p : "Feature Request: Py_NewInterpreter to create 
separate GIL (branch)"

Daniel Dittmar wrote:
 > robert wrote:
 >> I'd like to use multiple CPU cores for selected time consuming Python
 >> computations (incl. numpy/scipy) in a frictionless manner.
 >>
 >> Interprocess communication is tedious and out of question, so I
 >> thought about simply using a more Python interpreter instances
 >> (Py_NewInterpreter) with extra GIL in the same process.
 >
 > If I understand Python/ceval.c, the GIL is really global, not specific
 > to an interpreter instance:
 > static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */
 >

Thats the show stopper as of now.
There are only a handfull funcs in ceval.c to use that very global lock. 
The rest uses that funcs around thread states.

Would it be a possibilty in next Python to have the lock separate for 
each Interpreter instance.
Thus: have *interpreter_lock separate in each PyThreadState instance and 
only threads of same Interpreter have same GIL?
Separation between Interpreters seems to be enough. The Interpreter runs 
mainly on the stack. Possibly only very few global C-level resources 
would require individual extra locks.

Sooner or later Python will have to answer the multi-processor question.
A per-interpreter GIL and a nice module for tunneling Python-Objects 
directly between Interpreters inside one process might be the answer at 
the right border-line ? Existing extension code base would remain 
compatible, as far as there is already decent locking on module globals, 
which is the the usual case.

Robert

From larry at hastings.org  Sat Nov  4 07:38:45 2006
From: larry at hastings.org (Larry Hastings)
Date: Fri, 03 Nov 2006 22:38:45 -0800
Subject: [Python-Dev] The "lazy strings" patch [was: PATCH submitted:
 Speed up + for string concatenation, now as fast as "".join(x) idiom]
In-Reply-To: <453985ED.7050303@hastings.org>
References: <4523F890.9060804@hastings.org> <453985ED.7050303@hastings.org>
Message-ID: <454C3575.1070807@hastings.org>

On 2006/10/20, Larry Hastings wrote:
> I'm ready to post the patch.
Sheesh!  Where does the time go.


I've finally found the time to re-validate and post the patch.  It's 
SF.net patch #1590352:
    
http://sourceforge.net/tracker/index.php?func=detail&aid=1590352&group_id=5470&atid=305470
I've attached both the patch itself (against the current 2.6 revision, 
52618) and a lengthy treatise on the patch and its ramifications as I 
understand them.

I've also added one more experimental change: a new string method, 
str.simplify().  All it does is force a lazy concatenation / lazy slice 
to render.  (If the string isn't a lazy string, or it's already been 
rendered, str.simplify() is a no-op.)  The idea is, if you know these 
consarned "lazy slices" are giving you the oft-cited horrible memory 
usage scenario, you can tune your app by forcing the slices to render 
and drop their references.  99% of the time you don't care, and you 
enjoy the minor speedup.  The other 1% of the time, you call .simplify() 
and your code behaves as it did under 2.5.  Is this the right approach?  
I dunno.  So far I like it better than the alternatives.  But I'm open 
to suggestions, on this or any other aspect of the patch.

Cheers,


/larry/

From brett at python.org  Sat Nov  4 08:20:23 2006
From: brett at python.org (Brett Cannon)
Date: Fri, 3 Nov 2006 23:20:23 -0800
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create
	separate GIL (branch)
In-Reply-To: <454B1EDD.9050908@googlemail.com>
References: <454B1EDD.9050908@googlemail.com>
Message-ID: <bbaeab100611032320v5140621cq4ded19b920d7ea81@mail.gmail.com>

On 11/3/06, Robert <kxroberto at googlemail.com> wrote:
>
> repeated from c.l.p : "Feature Request: Py_NewInterpreter to create
> separate GIL (branch)"
>
> Daniel Dittmar wrote:
> > robert wrote:
> >> I'd like to use multiple CPU cores for selected time consuming Python
> >> computations (incl. numpy/scipy) in a frictionless manner.
> >>
> >> Interprocess communication is tedious and out of question, so I
> >> thought about simply using a more Python interpreter instances
> >> (Py_NewInterpreter) with extra GIL in the same process.
> >
> > If I understand Python/ceval.c, the GIL is really global, not specific
> > to an interpreter instance:
> > static PyThread_type_lock interpreter_lock = 0; /* This is the GIL */
> >
>
> Thats the show stopper as of now.
> There are only a handfull funcs in ceval.c to use that very global lock.
> The rest uses that funcs around thread states.
>
> Would it be a possibilty in next Python to have the lock separate for
> each Interpreter instance.
> Thus: have *interpreter_lock separate in each PyThreadState instance and
> only threads of same Interpreter have same GIL?
> Separation between Interpreters seems to be enough. The Interpreter runs
> mainly on the stack. Possibly only very few global C-level resources
> would require individual extra locks.


Right, but that's the trick.  For instance extension modules are shared
between interpreters.  Also look at the sys module and basically anything
that is set by a function call is a process-level setting that would also
need protection.  Then you get into the fun stuff of the possibility of
sharing objects created in one interpreter and then passed to another that
is not necessarily known ahead of time (whether it be directly through C
code or through process-level objects such as an attribute in an extension
module).

It is not as simple, unfortunately, as a few locks.

Sooner or later Python will have to answer the multi-processor question.
> A per-interpreter GIL and a nice module for tunneling Python-Objects
> directly between Interpreters inside one process might be the answer at
> the right border-line ? Existing extension code base would remain
> compatible, as far as there is already decent locking on module globals,
> which is the the usual case.


This is not true (see above).  From my viewpoint the only way for this to
work would be to come up with a way to wrap all access to module objects in
extension modules so that they are not trampled on because of separate locks
per-interpreter, or have to force all extension modules to be coded so that
they are instantiated individually per interpreter.  And of course deal with
all other process-level objects somehow.

The SMP issue for Python will most likely not happen until someone cares
enough to write code to do it and this take on it is no exception.  There is
no simple solution or else someone would have done it by now.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061103/9ca05403/attachment.html 

From jcarlson at uci.edu  Sat Nov  4 08:27:03 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Fri, 03 Nov 2006 23:27:03 -0800
Subject: [Python-Dev] The "lazy strings" patch [was: PATCH submitted:
	Speed up + for string concatenation,
	now as fast as "".join(x) idiom]
In-Reply-To: <454C3575.1070807@hastings.org>
References: <453985ED.7050303@hastings.org> <454C3575.1070807@hastings.org>
Message-ID: <20061103231558.81F0.JCARLSON@uci.edu>


Larry Hastings <larry at hastings.org> wrote:
> But I'm open 
> to suggestions, on this or any other aspect of the patch.

As Martin, I, and others have suggested, direct the patch towards Python
3.x unicode text.  Also, don't be surprised if Guido says no...
http://mail.python.org/pipermail/python-3000/2006-August/003334.html

In that message he talks about why view+string or string+view or
view+view should return strings.  Some are not quite applicable in this
case because with your implementation all additions can return a 'view'.
However, he also states the following with regards to strings vs. views
(an earlier variant of the "lazy strings" you propose),
    "Because they can have such different performance and memory usage
     characteristics, it's not right to treat them as the same type."
         - GvR

This suggests (at least to me) that unifying the 'lazy string' with the
2.x string is basically out of the question, which brings me back to my
earlier suggestion; make it into a wrapper that could be used with 3.x
bytes, 3.x text, and perhaps others.

 - Josiah


From martin at v.loewis.de  Sat Nov  4 09:15:44 2006
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Sat, 04 Nov 2006 09:15:44 +0100
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <loom.20061101T215739-136@post.gmane.org>
References: <eiaipj$l5s$1@sea.gmane.org>
	<4548DDFD.5030604@v.loewis.de>	<eiaovn$dar$1@sea.gmane.org>
	<4548FA58.4050702@v.loewis.de>
	<loom.20061101T215739-136@post.gmane.org>
Message-ID: <454C4C30.6080209@v.loewis.de>

Alexander Belopolsky schrieb:
> Multi-segment buffers are only dead because standard library modules
> do not support them.

That, in turn, is because nobody has contributed code to make that work.
My guess is that people either don't need it, or find it too difficult
to implement.

In any case, it is an important point that such a specification is
likely dead if the standard library doesn't support it throughout,
from start. So for this PEP, the same criterion likely applies: it's
not sufficient to specify an interface, one also has to specify
(and then implement) how that affects modules and types of the
standard library.

> I often work with text data that is represented
> as an array of strings.  I would love to implement a multi-segment
> buffer interface on top of that data and be able to do a full text
> regular expression search without having to concatenate into one big
> string, but python's re module would not take a multi-segment buffer.

If you are curious, try adding such a feature to re some time. I
expect that implementing it would be quite involved. I wonder what
Fredrik Lundh thinks about providing such a feature.

Regards,
Martin

From martin at v.loewis.de  Sat Nov  4 09:37:32 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Nov 2006 09:37:32 +0100
Subject: [Python-Dev] Status of pairing_heap.py?
In-Reply-To: <a9af71b70610290751n588a7ff6ib84e3157d3b68ef1@mail.gmail.com>
References: <a9af71b70610290751n588a7ff6ib84e3157d3b68ef1@mail.gmail.com>
Message-ID: <454C514C.9000602@v.loewis.de>

Paul Chiusano schrieb:
> I was looking for a good pairing_heap implementation and came across
> one that had apparently been checked in a couple years ago (!).

Have you looked at the heapq module? What application do you have
for a pairing heap that you can't do readily with the heapq module?

Anyway, the immediate author of this code is Dan Stutzbach (as
Raymond Hettinger's checkin message says); you probably should
contact him to find out whether the project is still alive.

Regards,
Martin

From martin at v.loewis.de  Sat Nov  4 09:49:53 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Nov 2006 09:49:53 +0100
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create
 separate GIL (branch)
In-Reply-To: <454B1EDD.9050908@googlemail.com>
References: <454B1EDD.9050908@googlemail.com>
Message-ID: <454C5431.7080609@v.loewis.de>

Robert schrieb:
> Would it be a possibilty in next Python to have the lock separate for
>  each Interpreter instance. Thus: have *interpreter_lock separate in
> each PyThreadState instance and only threads of same Interpreter have
> same GIL? Separation between Interpreters seems to be enough. The
> Interpreter runs mainly on the stack. Possibly only very few global
> C-level resources would require individual extra locks.

Notice that at least the following objects are shared between
interpreters, as they are singletons:
- None, True, False, (), "", u""
- strings of length 1, Unicode strings of length 1 with ord < 256
- integers between -5 and 256
How do you deal with the reference counters of these objects?

Also, type objects (in particular exception types) are shared between
interpreters. These are mutable objects, so you have actually
dictionaries shared between interpreters. How would you deal with these?

Also, the current thread state is a global variable, currently
(_PyThreadState_Current). How would you provide access to the current
thread state if there are multiple simultaneous threads?

Regards,
Martin

From martin at v.loewis.de  Sat Nov  4 16:47:37 2006
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Nov 2006 16:47:37 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
Message-ID: <454CB619.7010804@v.loewis.de>

Patch #1346572 proposes to also search for .pyc when OptimizeFlag
is set, and for .pyo when it is not set. The author argues this is
for consistency, as the zipimporter already does that.

This reasoning is somewhat flawed, of course: to achieve consistency,
one could also change the zipimporter instead.

However, I find the proposed behaviour reasonable: Python already
automatically imports the .pyc file if .py is not given and vice
versa. So why not look for .pyo if the .pyc file is not present?

What do you think?

Regards,
Martin

From murman at gmail.com  Sat Nov  4 17:09:11 2006
From: murman at gmail.com (Michael Urman)
Date: Sat, 4 Nov 2006 10:09:11 -0600
Subject: [Python-Dev] Path object design
In-Reply-To: <eih55s$7u0$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<454BD1A9.8080508@v.loewis.de>
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
	<eih55s$7u0$1@sea.gmane.org>
Message-ID: <dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>

On 11/3/06, Steve Holden <steve at holdenweb.com> wrote:
> Having said this, Andrew *did* demonstrate quite convincingly that the
> current urljoin has some fairly egregious directory traversal glitches.
> Is it really right to punt obvious gotchas like
>
>  >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../")
>
> 'http://blah.com/../../'

Ah, but how do you know when that's wrong? At least under ftp:// your
root is often a mid-level directory until you change up out of it.
http:// will tend to treat the targets as roots, but I don't know that
there's any requirement for a /.. to be meaningless (even if it often
is).

-- 
Michael Urman  http://www.tortall.net/../mu/blog ;)

From phd at phd.pp.ru  Sat Nov  4 17:47:37 2006
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Sat, 4 Nov 2006 19:47:37 +0300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454CB619.7010804@v.loewis.de>
References: <454CB619.7010804@v.loewis.de>
Message-ID: <20061104164737.GB29309@phd.pp.ru>

On Sat, Nov 04, 2006 at 04:47:37PM +0100, "Martin v. L?wis" wrote:
> Patch #1346572 proposes to also search for .pyc when OptimizeFlag
> is set, and for .pyo when it is not set. The author argues this is
> for consistency, as the zipimporter already does that.
> 
> This reasoning is somewhat flawed, of course: to achieve consistency,
> one could also change the zipimporter instead.
> 
> However, I find the proposed behaviour reasonable: Python already
> automatically imports the .pyc file if .py is not given and vice
> versa. So why not look for .pyo if the .pyc file is not present?
> 
> What do you think?

   +1 from me.

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From fredrik at pythonware.com  Sat Nov  4 17:52:09 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 04 Nov 2006 17:52:09 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454CB619.7010804@v.loewis.de>
References: <454CB619.7010804@v.loewis.de>
Message-ID: <eiigfo$ft4$2@sea.gmane.org>

Martin v. L?wis wrote:

> However, I find the proposed behaviour reasonable: Python already
> automatically imports the .pyc file if .py is not given and vice
> versa. So why not look for .pyo if the .pyc file is not present?

well, from a performance perspective, it would be nice if Python looked 
for *fewer* things, not more things.

(wouldn't transparent import of PYO files mean that you end up with a 
program where some assertions apply, and others don't?  could be con- 
fusing...)

</F>


From alexander.belopolsky at gmail.com  Sat Nov  4 18:13:28 2006
From: alexander.belopolsky at gmail.com (Alexander Belopolsky)
Date: Sat, 4 Nov 2006 12:13:28 -0500
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <454C4C30.6080209@v.loewis.de>
References: <eiaipj$l5s$1@sea.gmane.org>
	<4548DDFD.5030604@v.loewis.de>	<eiaovn$dar$1@sea.gmane.org>
	<4548FA58.4050702@v.loewis.de>
	<loom.20061101T215739-136@post.gmane.org>
	<454C4C30.6080209@v.loewis.de>
Message-ID: <6BD06AFE-6BA4-494F-BB68-B8EF651783EA@gmail.com>


On Nov 4, 2006, at 3:15 AM, Martin v. L?wis wrote:

> Alexander Belopolsky schrieb:
>> Multi-segment buffers are only dead because standard library modules
>> do not support them.
>
> That, in turn, is because nobody has contributed code to make that  
> work.
> My guess is that people either don't need it, or find it too difficult
> to implement.

Last time I tried to contribute code related to buffer protocol, it  
was rejected with little discussion

http://sourceforge.net/tracker/index.php? 
func=detail&aid=1539381&group_id=5470&atid=305470

that patch implemented two features: enabled creation of read-write  
buffer objects and added readinto method to StringIO.

The resolution was:

"""
The file object's readinto method is not meant for public
use, so adding the method to StringIO is not a good idea.
"""

The read-write buffer part was not discussed, but I guess the  
resolution would be that buffer objects are deprecated, so adding  
features to them is not a good idea.

>
> If you are curious, try adding such a feature to re some time. I
> expect that implementing it would be quite involved. I wonder what
> Fredrik Lundh thinks about providing such a feature.


I would certainly invest some time into that if that feature had a  
chance of being accepted.  At the moment I feel that anything related  
to buffers or buffer protocol is met with strong opposition.  I think  
the opposition is mostly fueled by the belief that buffer objects are  
"unsafe" and buffer protocol is deprecated.  None of these premises  
is correct AFAIK.


From steve at holdenweb.com  Sat Nov  4 18:16:51 2006
From: steve at holdenweb.com (Steve Holden)
Date: Sat, 04 Nov 2006 17:16:51 +0000
Subject: [Python-Dev] Path object design
In-Reply-To: <dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>	
	<454BD1A9.8080508@v.loewis.de>	
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>	
	<eih55s$7u0$1@sea.gmane.org>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
Message-ID: <454CCB03.1030806@holdenweb.com>

Michael Urman wrote:
> On 11/3/06, Steve Holden <steve at holdenweb.com> wrote:
> 
>> Having said this, Andrew *did* demonstrate quite convincingly that the
>> current urljoin has some fairly egregious directory traversal glitches.
>> Is it really right to punt obvious gotchas like
>>
>>  >>>urlparse.urljoin("http://blah.com/a/b/c", "../../../../")
>>
>> 'http://blah.com/../../'
> 
> 
> Ah, but how do you know when that's wrong? At least under ftp:// your
> root is often a mid-level directory until you change up out of it.
> http:// will tend to treat the targets as roots, but I don't know that
> there's any requirement for a /.. to be meaningless (even if it often
> is).
> 
I'm darned if I know. I simply know that it isn't right for http resources.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden

From martin at v.loewis.de  Sat Nov  4 19:23:44 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Nov 2006 19:23:44 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <eiigfo$ft4$2@sea.gmane.org>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
Message-ID: <454CDAB0.5040409@v.loewis.de>

Fredrik Lundh schrieb:
>> However, I find the proposed behaviour reasonable: Python already
>> automatically imports the .pyc file if .py is not given and vice
>> versa. So why not look for .pyo if the .pyc file is not present?
> 
> well, from a performance perspective, it would be nice if Python looked 
> for *fewer* things, not more things.

That's true.

> (wouldn't transparent import of PYO files mean that you end up with a 
> program where some assertions apply, and others don't?  could be con- 
> fusing...)

That's also true, however, it might still be better to do that instead
of raising an ImportError.

I'm not sure whether a scenario were you have only .pyo files for
some modules and only .pyc files for others is really likely, though,
and the performance hit of another system call doesn't sound attractive.

So I guess that zipimport should stop importing .pyo files if
OptimizeFlag is false, then?

Regards,
Martin

From fredrik at pythonware.com  Sat Nov  4 19:33:10 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 04 Nov 2006 19:33:10 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <454CCB03.1030806@holdenweb.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>		<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>		<454BD1A9.8080508@v.loewis.de>		<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>		<eih55s$7u0$1@sea.gmane.org>	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
	<454CCB03.1030806@holdenweb.com>
Message-ID: <eiimd5$17a$1@sea.gmane.org>

Steve Holden wrote:

>> Ah, but how do you know when that's wrong? At least under ftp:// your
>> root is often a mid-level directory until you change up out of it.
>> http:// will tend to treat the targets as roots, but I don't know that
>> there's any requirement for a /.. to be meaningless (even if it often
>> is).
>>
> I'm darned if I know. I simply know that it isn't right for http resources.

the URI specification disagrees; an URI that starts with "../" is per- 
fectly legal, and the specification explicitly states how it should be 
interpreted.

(it's important to realize that "urijoin" produces equivalent URI:s, not 
file names)

</F>


From martin at v.loewis.de  Sat Nov  4 20:00:55 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 04 Nov 2006 20:00:55 +0100
Subject: [Python-Dev] Status of pairing_heap.py?
In-Reply-To: <a9af71b70611041018x845307do2ca01c60c5189fac@mail.gmail.com>
References: <a9af71b70610290751n588a7ff6ib84e3157d3b68ef1@mail.gmail.com>	
	<454C514C.9000602@v.loewis.de>
	<a9af71b70611041018x845307do2ca01c60c5189fac@mail.gmail.com>
Message-ID: <454CE367.7000604@v.loewis.de>

Paul Chiusano schrieb:
> To support this, the insert method needs to return a reference to an
> object which I can then pass to adjust_key() and delete() methods.
> It's extremely difficult to have this functionality with array-based
> heaps because the index of an item in the array changes as items are
> inserted and removed.

I see.

> Okay, I'll do that. What needs to be done to move the project along
> and possibly get a pairing heap incorporated into a future version of
> python?

As a starting point, I think the implementation should get packaged
as an independent library, and be listed in the Cheeseshop for a
few years. If then there's wide interest in including it into Python,
it should be reconsidered. At that point, the then-authors of the
package will have to sign a contributor form.

Regards,
Martin

From osantana at gmail.com  Sat Nov  4 20:38:52 2006
From: osantana at gmail.com (Osvaldo Santana)
Date: Sat, 4 Nov 2006 16:38:52 -0300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454CDAB0.5040409@v.loewis.de>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454CDAB0.5040409@v.loewis.de>
Message-ID: <b674ca220611041138x64c98046y29fbfaaf169572f0@mail.gmail.com>

Hi,

I'm the author of this patch and we are already using it in Python
port for Maemo platform.

We are using .pyo modules mainly to remove docstrings from the
modules. We've discussed about this patch here[1] before.

Now, I agree that the zipimport behaviour is incorrect but I don't
have other option to remove docstrings of a .pyc file.

I'm planning to send a patch that adds a "--remove-docs" to the Python
interpreter to replace the "-OO" option that create only .pyo files.

[1] http://mail.python.org/pipermail/python-dev/2005-November/057959.html

On 11/4/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Fredrik Lundh schrieb:
> >> However, I find the proposed behaviour reasonable: Python already
> >> automatically imports the .pyc file if .py is not given and vice
> >> versa. So why not look for .pyo if the .pyc file is not present?
> >
> > well, from a performance perspective, it would be nice if Python looked
> > for *fewer* things, not more things.
>
> That's true.
[cut]

-- 
Osvaldo Santana Neto (aCiDBaSe)
http://www.pythonologia.org

From brett at python.org  Sat Nov  4 21:33:40 2006
From: brett at python.org (Brett Cannon)
Date: Sat, 4 Nov 2006 12:33:40 -0800
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454CDAB0.5040409@v.loewis.de>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454CDAB0.5040409@v.loewis.de>
Message-ID: <bbaeab100611041233g4fced7f3ned09bc1f932e5cfd@mail.gmail.com>

On 11/4/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>
> Fredrik Lundh schrieb:
> >> However, I find the proposed behaviour reasonable: Python already
> >> automatically imports the .pyc file if .py is not given and vice
> >> versa. So why not look for .pyo if the .pyc file is not present?
> >
> > well, from a performance perspective, it would be nice if Python looked
> > for *fewer* things, not more things.
>
> That's true.
>
> > (wouldn't transparent import of PYO files mean that you end up with a
> > program where some assertions apply, and others don't?  could be con-
> > fusing...)
>
> That's also true, however, it might still be better to do that instead
> of raising an ImportError.
>
> I'm not sure whether a scenario were you have only .pyo files for
> some modules and only .pyc files for others is really likely, though,
> and the performance hit of another system call doesn't sound attractive.
>
> So I guess that zipimport should stop importing .pyo files if
> OptimizeFlag is false, then?


Yes, I think it should.  When I get around to rewriting zipimport for my
import rewrite it will do this by default.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061104/40d3c521/attachment.html 

From brett at python.org  Sat Nov  4 21:40:23 2006
From: brett at python.org (Brett Cannon)
Date: Sat, 4 Nov 2006 12:40:23 -0800
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <b674ca220611041138x64c98046y29fbfaaf169572f0@mail.gmail.com>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454CDAB0.5040409@v.loewis.de>
	<b674ca220611041138x64c98046y29fbfaaf169572f0@mail.gmail.com>
Message-ID: <bbaeab100611041240h39ad1010lf8c0370b3947ace@mail.gmail.com>

On 11/4/06, Osvaldo Santana <osantana at gmail.com> wrote:
>
> Hi,
>
> I'm the author of this patch and we are already using it in Python
> port for Maemo platform.
>
> We are using .pyo modules mainly to remove docstrings from the
> modules. We've discussed about this patch here[1] before.
>
> Now, I agree that the zipimport behaviour is incorrect but I don't
> have other option to remove docstrings of a .pyc file.
>
> I'm planning to send a patch that adds a "--remove-docs" to the Python
> interpreter to replace the "-OO" option that create only .pyo files.
>
> [1] http://mail.python.org/pipermail/python-dev/2005-November/057959.html


The other option is to do away with .pyo files:
http://www.python.org/dev/summary/2005-11-01_2005-11-15/#importing-pyc-and-pyo-files

Guido has said he wouldn't mind it, but then .pyc files need to grow a field
or so to be able to store what optimizations were used.  While this would
lead to more bytecode regeneration, it would help deal with this case and
allow for more optimizations on the bytecode.

-Brett


On 11/4/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Fredrik Lundh schrieb:
> > >> However, I find the proposed behaviour reasonable: Python already
> > >> automatically imports the .pyc file if .py is not given and vice
> > >> versa. So why not look for .pyo if the .pyc file is not present?
> > >
> > > well, from a performance perspective, it would be nice if Python
> looked
> > > for *fewer* things, not more things.
> >
> > That's true.
> [cut]
>
> --
> Osvaldo Santana Neto (aCiDBaSe)
> http://www.pythonologia.org
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061104/5347f0af/attachment.htm 

From jcarlson at uci.edu  Sat Nov  4 21:50:51 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sat, 04 Nov 2006 12:50:51 -0800
Subject: [Python-Dev] Status of pairing_heap.py?
In-Reply-To: <454CE367.7000604@v.loewis.de>
References: <a9af71b70611041018x845307do2ca01c60c5189fac@mail.gmail.com>
	<454CE367.7000604@v.loewis.de>
Message-ID: <20061104122150.81FF.JCARLSON@uci.edu>


"Martin v. L?wis" <martin at v.loewis.de> wrote:
> Paul Chiusano schrieb:
> > To support this, the insert method needs to return a reference to an
> > object which I can then pass to adjust_key() and delete() methods.
> > It's extremely difficult to have this functionality with array-based
> > heaps because the index of an item in the array changes as items are
> > inserted and removed.
> 
> I see.

It is not required.  If you are careful, you can implement a pairing
heap with a structure combining a dictionary and list.  It requires that
all values be unique and hashable, but it is possible (I developed one
for a commercial project).

If other people find the need for it, I could rewrite it (can't release
the closed source).  It would use far less memory than the pairing heap
implementation provided in the sandbox, and could be converted to C if
desired and/or required.  On the other hand, I've found the pure Python
version to be fast enough for most things I've needed it for.

 - Josiah


From greg.ewing at canterbury.ac.nz  Sun Nov  5 02:21:34 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Sun, 05 Nov 2006 14:21:34 +1300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <eiigfo$ft4$2@sea.gmane.org>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
Message-ID: <454D3C9E.5030505@canterbury.ac.nz>

Fredrik Lundh wrote:

> well, from a performance perspective, it would be nice if Python looked 
> for *fewer* things, not more things.

Instead of searching for things by doing a stat call
for each possible file name, would it perhaps be
faster to read the contents of all the directories
along sys.path into memory and then go searching
through that?

--
Greg

From exarkun at divmod.com  Sun Nov  5 02:37:32 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Sat, 4 Nov 2006 20:37:32 -0500
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454D3C9E.5030505@canterbury.ac.nz>
Message-ID: <20061105013732.20948.1283244333.divmod.quotient.13773@ohm>

On Sun, 05 Nov 2006 14:21:34 +1300, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>Fredrik Lundh wrote:
>
>> well, from a performance perspective, it would be nice if Python looked
>> for *fewer* things, not more things.
>
>Instead of searching for things by doing a stat call
>for each possible file name, would it perhaps be
>faster to read the contents of all the directories
>along sys.path into memory and then go searching
>through that?

Bad for large directories.  There's a cross-over at some number
of entries.  Maybe Python should have a runtime-tuned heuristic
for selecting a filesystem traversal mechanism.

Jean-Paul

From martin at v.loewis.de  Sun Nov  5 04:14:11 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Nov 2006 04:14:11 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454D3C9E.5030505@canterbury.ac.nz>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz>
Message-ID: <454D5703.5070509@v.loewis.de>

Greg Ewing schrieb:
> Fredrik Lundh wrote:
> 
>> well, from a performance perspective, it would be nice if Python looked 
>> for *fewer* things, not more things.
> 
> Instead of searching for things by doing a stat call
> for each possible file name, would it perhaps be
> faster to read the contents of all the directories
> along sys.path into memory and then go searching
> through that?

That should never be better: the system will cache the directory
blocks, also, and it will do a better job than Python will.

Regards,
Martin

From brett at python.org  Sun Nov  5 08:28:59 2006
From: brett at python.org (Brett Cannon)
Date: Sat, 4 Nov 2006 23:28:59 -0800
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <20061105013732.20948.1283244333.divmod.quotient.13773@ohm>
References: <454D3C9E.5030505@canterbury.ac.nz>
	<20061105013732.20948.1283244333.divmod.quotient.13773@ohm>
Message-ID: <bbaeab100611042328g3f4be7caxc82a2545a35da7df@mail.gmail.com>

On 11/4/06, Jean-Paul Calderone <exarkun at divmod.com> wrote:
>
> On Sun, 05 Nov 2006 14:21:34 +1300, Greg Ewing <
> greg.ewing at canterbury.ac.nz> wrote:
> >Fredrik Lundh wrote:
> >
> >> well, from a performance perspective, it would be nice if Python looked
> >> for *fewer* things, not more things.
> >
> >Instead of searching for things by doing a stat call
> >for each possible file name, would it perhaps be
> >faster to read the contents of all the directories
> >along sys.path into memory and then go searching
> >through that?
>
> Bad for large directories.  There's a cross-over at some number
> of entries.  Maybe Python should have a runtime-tuned heuristic
> for selecting a filesystem traversal mechanism.


Hopefully my import rewrite is flexible enough that people will be able to
plug in their own importer/loader for the filesystem so that they can tune
how things like this are handled (e.g., caching what files are in a
directory, skipping bytecode files, etc.).

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061104/7a45ac89/attachment.html 

From steve at holdenweb.com  Sun Nov  5 10:13:38 2006
From: steve at holdenweb.com (Steve Holden)
Date: Sun, 05 Nov 2006 09:13:38 +0000
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <bbaeab100611042328g3f4be7caxc82a2545a35da7df@mail.gmail.com>
References: <454D3C9E.5030505@canterbury.ac.nz>	<20061105013732.20948.1283244333.divmod.quotient.13773@ohm>
	<bbaeab100611042328g3f4be7caxc82a2545a35da7df@mail.gmail.com>
Message-ID: <eik9tv$if4$1@sea.gmane.org>

[Off-list]
Brett Cannon wrote:
[...]
> 
> Hopefully my import rewrite is flexible enough that people will be able 
> to plug in their own importer/loader for the filesystem so that they can 
> tune how things like this are handled (e.g., caching what files are in a 
> directory, skipping bytecode files, etc.).
> 
I just wondered whether you plan to support other importers of the PEP 
302 style? I have been experimenting with import from database, and 
would like to see that work migrate to your rewrite if possible.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden


From stephen at xemacs.org  Sun Nov  5 10:10:44 2006
From: stephen at xemacs.org (stephen at xemacs.org)
Date: Sun, 05 Nov 2006 18:10:44 +0900
Subject: [Python-Dev] Path object design
In-Reply-To: <dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<454BD1A9.8080508@v.loewis.de>
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
	<eih55s$7u0$1@sea.gmane.org>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
Message-ID: <87y7qqqmwb.fsf@uwakimon.sk.tsukuba.ac.jp>

Michael Urman writes:

 > Ah, but how do you know when that's wrong? At least under ftp:// your
 > root is often a mid-level directory until you change up out of it.
 > http:// will tend to treat the targets as roots, but I don't know that
 > there's any requirement for a /.. to be meaningless (even if it often
 > is).

ftp and http schemes both have authority ("host") components, so the
meaning of ".." path components is defined in the same way for both by
section 5 of RFC 3986.

Of course an FTP server is not bound to interpret the protocol so as
to mimic URL semantics.  But that's a different question.


From dalke at dalkescientific.com  Sun Nov  5 12:23:25 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sun, 5 Nov 2006 12:23:25 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <eiimd5$17a$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<454BD1A9.8080508@v.loewis.de>
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
	<eih55s$7u0$1@sea.gmane.org>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>
Message-ID: <d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>

Steve:
> > I'm darned if I know. I simply know that it isn't right for http resources.

/F:
> the URI specification disagrees; an URI that starts with "../" is per-
> fectly legal, and the specification explicitly states how it should be
> interpreted.

I have looked at the spec, and can't figure out how its explanation
matches the observed urljoin results.  Steve's excerpt trimmed out
the strangest example.

>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..")  # What?!
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../../")
'http://blah.com/../../'
>>>

> (it's important to realize that "urijoin" produces equivalent URI:s, not
> file names)

Both, though, are "paths".  The OP, Mik Orr, wrote:

   I agree that supporting non-filesystem directories (zip files,
   CSV/Subversion sandboxes, URLs) would be nice, but we already have a
   big enough project without that.  What constraints should a Path
   object keep in mind in order to be forward-compatible with this?

Is the answer therefore that URLs and URI behaviour should not
place constraints on a Path object becuse they are sufficiently
dissimilar from file-system paths?  Do these other non-FS hierarchical
structures have similar differences causing a semantic mismatch?

                                Andrew
                                dalke at dalkescientific.com

From paul.chiusano at gmail.com  Sat Nov  4 19:18:02 2006
From: paul.chiusano at gmail.com (Paul Chiusano)
Date: Sat, 4 Nov 2006 13:18:02 -0500
Subject: [Python-Dev] Status of pairing_heap.py?
In-Reply-To: <454C514C.9000602@v.loewis.de>
References: <a9af71b70610290751n588a7ff6ib84e3157d3b68ef1@mail.gmail.com>
	<454C514C.9000602@v.loewis.de>
Message-ID: <a9af71b70611041018x845307do2ca01c60c5189fac@mail.gmail.com>

Hi Martin,

Yes, I'm familiar with the heapq module, but it doesn't do all that
I'd like. The main functionality I am looking for is the ability to
adjust the value of an item in the heap and delete items from the
heap. There's a lot of heap applications where this is useful. (I
might even say most heap applications!)

To support this, the insert method needs to return a reference to an
object which I can then pass to adjust_key() and delete() methods.
It's extremely difficult to have this functionality with array-based
heaps because the index of an item in the array changes as items are
inserted and removed.

I guess I don't need a pairing heap, but of the pointer-based heaps
I've looked at, pairing heaps seem to be the simplest while still
having good complexity guarantees.

> Anyway, the immediate author of this code is Dan Stutzbach (as
> Raymond Hettinger's checkin message says); you probably should
> contact him to find out whether the project is still alive.

Okay, I'll do that. What needs to be done to move the project along
and possibly get a pairing heap incorporated into a future version of
python?

Best,
Paul

On 11/4/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Paul Chiusano schrieb:
> > I was looking for a good pairing_heap implementation and came across
> > one that had apparently been checked in a couple years ago (!).
>
> Have you looked at the heapq module? What application do you have
> for a pairing heap that you can't do readily with the heapq module?
>
> Anyway, the immediate author of this code is Dan Stutzbach (as
> Raymond Hettinger's checkin message says); you probably should
> contact him to find out whether the project is still alive.
>
> Regards,
> Martin
>

From aahz at pythoncraft.com  Sun Nov  5 17:24:58 2006
From: aahz at pythoncraft.com (Aahz)
Date: Sun, 5 Nov 2006 08:24:58 -0800
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454D5703.5070509@v.loewis.de>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
Message-ID: <20061105162458.GA23812@panix.com>

On Sun, Nov 05, 2006, "Martin v. L?wis" wrote:
> Greg Ewing schrieb:
>> Fredrik Lundh wrote:
>>> 
>>> well, from a performance perspective, it would be nice if Python looked 
>>> for *fewer* things, not more things.
>> 
>> Instead of searching for things by doing a stat call for each
>> possible file name, would it perhaps be faster to read the contents
>> of all the directories along sys.path into memory and then go
>> searching through that?
>
> That should never be better: the system will cache the directory
> blocks, also, and it will do a better job than Python will.

Maybe so, but I recently dealt with a painful bottleneck in Python code
caused by excessive stat() calls on a directory with thousands of files,
while the os.listdir() function was bogging things down hardly at all.
Granted, Python bytecode was almost certainly the cause of much of the
overhead, but I still suspect that a simple listing will be faster in C
code because of fewer system calls.  It should be a matter of profiling
before this suggestion is rejected rather than making assertions about
what "should" be happening.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 1993

From sluggoster at gmail.com  Sun Nov  5 17:59:33 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Sun, 5 Nov 2006 08:59:33 -0800
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<454BD1A9.8080508@v.loewis.de>
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
	<eih55s$7u0$1@sea.gmane.org>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
Message-ID: <6e9196d20611050859x7b39410eyb7de882d52713631@mail.gmail.com>

On 11/5/06, Andrew Dalke <dalke at dalkescientific.com> wrote:
>
>    I agree that supporting non-filesystem directories (zip files,
>    CSV/Subversion sandboxes, URLs) would be nice, but we already have a
>    big enough project without that.  What constraints should a Path
>    object keep in mind in order to be forward-compatible with this?
>
> Is the answer therefore that URLs and URI behaviour should not
> place constraints on a Path object becuse they are sufficiently
> dissimilar from file-system paths?  Do these other non-FS hierarchical
> structures have similar differences causing a semantic mismatch?

This discussion has renforced my belief that os.path.join's behavior
is correct with non-initial absolute args:

    os.path.join('/usr/bin', '/usr/local/bin/python')

I've used that in applications and haven't found it a burden.

Its behavior with '..' seems justifiable too, and Talin's trick of
wrapping everything in os.path.normpath is a great one.

I do think join should take more care to avoid multiple slashes
together in the middle of a path, although this is really the
responsibility of the platform library, not a generic function/method.
 Join is true to its documentation of only adding separators and never
than deleting them, but that seems like a bit of sloppiness.   On the
other hand, the filesystems don't care; I don't think anybody has
mentioned a case where it actually creates a path the filesystem can't
handle.

urljoin clearly has a different job.  When we talked about extending
path to URLs, I was thinking more in terms of opening files, fetching
resources, deleting, renaming, etc. rather than split-modify-rejoin.
A hypothetical urlpath module would clearly have to follow the URL
rules.  I don't see a contradition in supporting both URL joining
rules and having a non-initial absolute argument, just to avoid
cross-"platform" surprises.  But urlpath would also need methods to
parse the scheme and host on demand, query strings, #fragments, a
class method for building a URL from the smallest parts, etc.

As for supporting path fragments and '..' in join arguments (for
filesystem paths), it's clearly too widely used to eliminate.  Users
can voluntarily refrain from passing arguments containing separators.
For cases involving a user-supplied -- possibly hostile -- path,
either a separate method (safe_join, child) could achieve this, or a
subclass implemetation that allows only safe arguments.

Regarding pathname-manipulation methods and filesystem-access methods,
I'm not sure how workable it is to have separate objects for them.

    os.mkdir(   Path("/usr/local/lib/python/Cheetah/Template.py").parent   )
    Path("/usr/local/lib/python/Cheetah/Template.py").parent.mkdir()
    FileAccess(
Path("/usr/local/lib/python/Cheetah/Template.py").parent   ).mkdir()

The first two are reasonable.  The third... who would want to do this
for every path?  How often would you reuse the FileAccess object?  I
typically create Path objects from configuration values and keep them
around for the entire application; e.g., data_dir.  Then I create
derived paths as necessary. I suppose if the FileAccess object has a
.path attribute, it could do double-duty so you wouldn't have to store
the path separately.  Is this what the advocates of two classes have
in mind?  With usage like this?

    my_file = FileAccess(   file_access_obj.path.joinpath("my_file")   )
    my_file = FileAccess(   Path(file_access_obj,path, "my_file")   )

Working on my Path implementation.  (Yes it's necessary, Glyph, at
least to me.)  It's going slow because I just got a Macintosh laptop
and am still rounding up packages to install.

-- 
Mike Orr <sluggoster at gmail.com>

From jcarlson at uci.edu  Sun Nov  5 19:24:45 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Sun, 05 Nov 2006 10:24:45 -0800
Subject: [Python-Dev] Status of pairing_heap.py?
In-Reply-To: <a9af71b70611050936l1fb09277mf7b4537c88143c75@mail.gmail.com>
References: <20061104122150.81FF.JCARLSON@uci.edu>
	<a9af71b70611050936l1fb09277mf7b4537c88143c75@mail.gmail.com>
Message-ID: <20061105095621.8212.JCARLSON@uci.edu>


"Paul Chiusano" <paul.chiusano at gmail.com> wrote:
> 
> > It is not required.  If you are careful, you can implement a pairing
> > heap with a structure combining a dictionary and list.
> 
> That's interesting. Can you give an overview of how you can do that? I
> can't really picture it. You can support all the pairing heap
> operations with the same complexity guarantees? Do you mean a linked
> list here or an array?

I mean a Python list.  The trick is to implement a sequence API that
keeps track of the position of any 'pair'.  That is, ph[posn] will
return a 'pair' object, but when you perform ph[posn] = pair, you also
update a mapping; ph.mapping[pair.value] = posn .  With a few other bits,
one can use heapq directly and get all of the features of the pairing
heap API without keeping an explicit tree with links, etc.

In terms of running time, adjust_key, delete, and extract(0) are all
O(logn), meld is O(min(n+m, mlog(n+m))), empty and peek are O(1), values
is O(n), and extract_all is O(nlogn) but uses list.sort() rather than
repeatedly pulling from the heap (heapq's documentation suggests this is
faster in terms of comparisions, but likely very much faster in terms of
actual running time).

Attached is a sample implementation using this method with a small test
example.  It may or may not use less memory than the sandbox
pairing_heap.py, and using bare lists rather than pairs may result in
less memory overall (if there exists a list "free list"), but this
should give you something to start with.

 - Josiah

> Paul
> 
> On 11/4/06, Josiah Carlson <jcarlson at uci.edu> wrote:
> >
> > "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > > Paul Chiusano schrieb:
> > > > To support this, the insert method needs to return a reference to an
> > > > object which I can then pass to adjust_key() and delete() methods.
> > > > It's extremely difficult to have this functionality with array-based
> > > > heaps because the index of an item in the array changes as items are
> > > > inserted and removed.
> > >
> > > I see.
> >
> > It is not required.  If you are careful, you can implement a pairing
> > heap with a structure combining a dictionary and list.  It requires that
> > all values be unique and hashable, but it is possible (I developed one
> > for a commercial project).
> >
> > If other people find the need for it, I could rewrite it (can't release
> > the closed source).  It would use far less memory than the pairing heap
> > implementation provided in the sandbox, and could be converted to C if
> > desired and/or required.  On the other hand, I've found the pure Python
> > version to be fast enough for most things I've needed it for.
> >
> >  - Josiah
> >
> >
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pair_heap.py
Type: application/octet-stream
Size: 5377 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061105/62f57d3a/attachment.obj 

From martin at v.loewis.de  Sun Nov  5 20:22:13 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Nov 2006 20:22:13 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>	<454BD1A9.8080508@v.loewis.de>	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>	<eih55s$7u0$1@sea.gmane.org>	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>	<454CCB03.1030806@holdenweb.com>
	<eiimd5$17a$1@sea.gmane.org>
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
Message-ID: <454E39E5.8040604@v.loewis.de>

Andrew Dalke schrieb:
> I have looked at the spec, and can't figure out how its explanation
> matches the observed urljoin results.  Steve's excerpt trimmed out
> the strangest example.

Unfortunately, you didn't say which of these you want explained.
As it is tedious to write down even a single one, I restrain to the
one with the What?! remark.

>>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..")  # What?!
> 'http://blah.com/'

Please follow me through section 5 of

http://www.ietf.org/rfc/rfc3986.txt

5.2.1: Pre-parse the Base URI
 B.scheme = "http"
 B.authority = "blah.com"
 B.path = "/a/b/c"
 B.query = undefined
 B.fragment = undefined

5.2.2: Transform References
 parse("../../../..")
 R.scheme = R.authority = R.query = R.fragment = undefined
 R.path = "../../../.."
 (strictness not relevant, R.scheme is already undefined)
 R.scheme is not defined
 R.authority is not defined
 R.path is not ""
 R.path does not start with /
 T.path = merge("/a/b/c", "../../../..")
 T.path = remove_dot_segments(T.path)
 T.authority = "blah.com"
 T.scheme = "http"
 T.fragment = undefined

5.2.3 Merge paths
 merge("/a/b/c", "../../../..") =
 (base URI does have path)
 "/a/b/../../../.."

5.2.4 Remove Dot Segments
 remove_dot_segments("/a/b/../../../..")
 1. I = "/a/b/../../../.."
    O = ""
 2. A (does not apply)
    B (does not apply)
    C (does not apply)
    D (does not apply)
    E O="/a" I="/b/../../../.."
 2. E O="/a/b" I="/../../../.."
 2. C O="/a" I="/../../.."
 2. C O="" I="/../.."
 2. C O="" I="/.."
 2. C O="" I="/"
 2. E O="/" I=""
 3. Result: "/"

5.3 Component Recomposition
 result = ""
 (scheme is defined)
 result = "http:"
 (authority is defined)
 result = "http://blah.com"
 (append path)
 result = "http://blah.com/"

HTH,
Martin

From brett at python.org  Sun Nov  5 21:07:03 2006
From: brett at python.org (Brett Cannon)
Date: Sun, 5 Nov 2006 12:07:03 -0800
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <eik9tv$if4$1@sea.gmane.org>
References: <454D3C9E.5030505@canterbury.ac.nz>
	<20061105013732.20948.1283244333.divmod.quotient.13773@ohm>
	<bbaeab100611042328g3f4be7caxc82a2545a35da7df@mail.gmail.com>
	<eik9tv$if4$1@sea.gmane.org>
Message-ID: <bbaeab100611051207n46d38501r20be22994a5a8080@mail.gmail.com>

On 11/5/06, Steve Holden <steve at holdenweb.com> wrote:
>
> [Off-list]
> Brett Cannon wrote:
> [...]
> >
> > Hopefully my import rewrite is flexible enough that people will be able
> > to plug in their own importer/loader for the filesystem so that they can
> > tune how things like this are handled (e.g., caching what files are in a
> > directory, skipping bytecode files, etc.).
> >
> I just wondered whether you plan to support other importers of the PEP
> 302 style? I have been experimenting with import from database, and
> would like to see that work migrate to your rewrite if possible.


Yep.  The main point of this rewrite is to refactor the built-in importers
to be PEP 302 importers so that they can easily be left out to protect
imports.  Plus I have made sure that doing something like .ptl files off the
filesystem is simple (a subclass with a single method overloaded) or
introducing a DB as a back-end store (should only require the
importer/loader part; can even use an existing class to handle whether
bytecode should be recreated or not).

Since a DB back-end is a specific use-case I even have notes in the module
docstring stating how I would go about doing it.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061105/2931109c/attachment.html 

From martin at v.loewis.de  Sun Nov  5 21:36:51 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 05 Nov 2006 21:36:51 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <20061105162458.GA23812@panix.com>
References: <454CB619.7010804@v.loewis.de>
	<eiigfo$ft4$2@sea.gmane.org>	<454D3C9E.5030505@canterbury.ac.nz>
	<454D5703.5070509@v.loewis.de> <20061105162458.GA23812@panix.com>
Message-ID: <454E4B63.2020603@v.loewis.de>

Aahz schrieb:
> Maybe so, but I recently dealt with a painful bottleneck in Python code
> caused by excessive stat() calls on a directory with thousands of files,
> while the os.listdir() function was bogging things down hardly at all.
> Granted, Python bytecode was almost certainly the cause of much of the
> overhead, but I still suspect that a simple listing will be faster in C
> code because of fewer system calls.  It should be a matter of profiling
> before this suggestion is rejected rather than making assertions about
> what "should" be happening.

That works both ways, of course: whoever implements such a patch should
also provide profiling information.

Last time I changed the importing code to reduce the number of stat
calls, I could hardly demonstrate a speedup.

Regards,
Martin

From dalke at dalkescientific.com  Sun Nov  5 23:29:13 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Sun, 5 Nov 2006 23:29:13 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <454E39E5.8040604@v.loewis.de>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>
	<454BD1A9.8080508@v.loewis.de>
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
	<eih55s$7u0$1@sea.gmane.org>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
	<454E39E5.8040604@v.loewis.de>
Message-ID: <d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>

Martin:
> Unfortunately, you didn't say which of these you want explained.
> As it is tedious to write down even a single one, I restrain to the
> one with the What?! remark.
>
> >>>> urlparse.urljoin("http://blah.com/a/b/c", "../../../..")  # What?!
> > 'http://blah.com/'

The "What?!" is in context with the previous and next entries.  I've
reduced it to a simpler case

>>> urlparse.urljoin("http://blah.com/", "..")
'http://blah.com/'
>>> urlparse.urljoin("http://blah.com/", "../")
'http://blah.com/../'
>>> urlparse.urljoin("http://blah.com/", "../..")
'http://blah.com/'

Does the result make sense to you?  Does it make
sense that the last of these is shorter than the middle
one?  It sure doesn't to me.  I thought it was obvious
that there was an error; obvious enough that I didn't
bother to track down why - especially as my main point
was to argue there are different ways to deal with
hierarchical/path-like schemes, each correct for its
given domain.

> Please follow me through section 5 of
>
> http://www.ietf.org/rfc/rfc3986.txt

The core algorithm causing the "what?!" comes from
"reduce_dot_segments", section 5.2.4.  In parallel my
3 cases should give:

5.2.4 Remove Dot Segments
 remove_dot_segments("/..")    r_d_s("/../")    r_d_s("/../..")

 1. I = "/.."               I="/../"            I="/../.."
    O = ""                  O=""                O=""
 2A. (does not apply)     2A. (does not apply)  2A. (does not apply)
 2B. (does not apply)     2B. (does not apply)  2B. (does not apply)
 2C. O="" I="/"           2C. O="" I="/"        2C. O="" I="/.."
 2A. (does not apply)     2A. (does not apply)   .. reduces to r_d_s("/..")
 2B. (does not apply)     2B. (does not apply)  3. Result "/"
 2C. (does not apply)     2C. (does not apply)
 2D. (does not apply)     2D. (does not apply)
 2E. O="/", I=""          2E. O="/", I=""
 3. Result: "/"           3. Result "/"

My reading of the RFC 3986 says all three examples should
produce the same result.  The fact that my "what?!" comment happens
to be correct according to that RFC is purely coincidental.

Then again, urlparse.py does *not* claim to be RFC 3986 compliant.
The module docstring is

"""Parse (absolute and relative) URLs.

See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
UC Irvine, June 1995.
"""

I tried the same code with 4Suite, which does claim compliance, and get

>>> import Ft
>>> from Ft.Lib import Uri
>>> Uri.Absolutize("..", "http://blah.com/")
'http://blah.com/'
>>> Uri.Absolutize("../", "http://blah.com/")
'http://blah.com/'
>>> Uri.Absolutize("../..", "http://blah.com/")
'http://blah.com/'
>>>

The text of it's Uri.py says

    This function is similar to urlparse.urljoin() and urllib.basejoin().
    Those functions, however, are (as of Python 2.3) outdated, buggy, and/or
    designed to produce results acceptable for use with other core Python
    libraries, rather than being earnest implementations of the relevant
    specs. Their problems are most noticeable in their handling of
    same-document references and 'file:' URIs, both being situations that
    come up far too often to consider the functions reliable enough for
    general use.
    """
    # Reasons to avoid using urllib.basejoin() and urlparse.urljoin():
    # - Both are partial implementations of long-obsolete specs.
    # - Both accept relative URLs as the base, which no spec allows.
    # - urllib.basejoin() mishandles the '' and '..' references.
    # - If the base URL uses a non-hierarchical or relative path,
    #    or if the URL scheme is unrecognized, the result is not
    #    always as expected (partly due to issues in RFC 1808).
    # - If the authority component of a 'file' URI is empty,
    #    the authority component is removed altogether. If it was
    #    not present, an empty authority component is in the result.
    # - '.' and '..' segments are not always collapsed as well as they
    #    should be (partly due to issues in RFC 1808).
    # - Effective Python 2.4, urllib.basejoin() *is* urlparse.urljoin(),
    #    but urlparse.urljoin() is still based on RFC 1808.

In searching the archives
  http://mail.python.org/pipermail/python-dev/2005-September/056152.html

Fabien Schwob:
> I'm using the module urlparse and I think I've found a bug in the
> urlparse module. When you merge an url and a link
> like"../../../page.html" with urljoin, the new url created keep some
> "../" in it. Here is an example :
>
>  >>> import urlparse
>  >>> begin = "http://www.example.com/folder/page.html"
>  >>> end = "../../../otherpage.html"
>  >>> urlparse.urljoin(begin, end)
> 'http://www.example.com/../../otherpage.html'

Guido:
> You shouldn't be giving more "../" sequences than are possible. I find
> the current behavior acceptable.

(Aparently for RFC 1808 that's a valid answer; it was an implementation
choice in how to handle that case.)

While not directly relevant, postings like John J Lee's
 http://mail.python.org/pipermail/python-bugs-list/2006-February/031875.html
> The urlparse.urlparse() code should not be changed, for
> backwards compatibility reasons.

strongly suggest a desire to not change that code.  The last
definitive statement on this topic that I could find was mentioned in
http://www.python.org/dev/summary/2005-11-16_2005-11-30/#updating-urlparse-to-support-rfc-3986
> Guido pointed out that the main purpose of urlparse is to be RFC-compliant.
> Paul explained that the current code is valid according to RFC 1808
> (1995-1998), but that this was superceded by RFC 2396 (1998-2004)
> and RFC 3986 (2005-). Guido was convinced, and asked for a new API
> (for backwards compatibility) and a patch to be submitted via sourceforge.

As this is not a bug, I have added the feature request 1591035 to SF
titled "update urlparse to RFC 3986".  Nothing else appeared to exist
on that specific topic.

                                Andrew
                                dalke at dalkescientific.com

From martin at v.loewis.de  Mon Nov  6 00:06:07 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Nov 2006 00:06:07 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	
	<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>	
	<454BD1A9.8080508@v.loewis.de>	
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>	
	<eih55s$7u0$1@sea.gmane.org>	
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>	
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>	
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>	
	<454E39E5.8040604@v.loewis.de>
	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>
Message-ID: <454E6E5F.7070800@v.loewis.de>

Andrew Dalke schrieb:
>>>> urlparse.urljoin("http://blah.com/", "..")
> 'http://blah.com/'
>>>> urlparse.urljoin("http://blah.com/", "../")
> 'http://blah.com/../'
>>>> urlparse.urljoin("http://blah.com/", "../..")
> 'http://blah.com/'
> 
> Does the result make sense to you?  Does it make
> sense that the last of these is shorter than the middle
> one?  It sure doesn't to me.  I thought it was obvious
> that there was an error;

That wasn't obvious at all to me. Now looking at the
examples, I agree there is an error. The middle one
is incorrect;

urlparse.urljoin("http://blah.com/", "../")

should also give 'http://blah.com/'.

>> You shouldn't be giving more "../" sequences than are possible. I find
>> the current behavior acceptable.
> 
> (Aparently for RFC 1808 that's a valid answer; it was an implementation
> choice in how to handle that case.)

There is still some text left to that respect in 5.4.2 of RFC 3986.

> While not directly relevant, postings like John J Lee's
> http://mail.python.org/pipermail/python-bugs-list/2006-February/031875.html
>> The urlparse.urlparse() code should not be changed, for
>> backwards compatibility reasons.
> 
> strongly suggest a desire to not change that code.

This is John J Lee's opinion, of course. I don't see a reason not to fix
such bugs, or to update the implementation to the current RFCs.

> As this is not a bug, I have added the feature request 1591035 to SF
> titled "update urlparse to RFC 3986".  Nothing else appeared to exist
> on that specific topic.

Thanks. It always helps to be more specific; being less specific often
hurts. I find there is a difference between "urllib behaves
non-intuitively" and "urllib gives result A for parameters B and C,
but should give result D instead". Can you please add specific examples
to your report that demonstrate the difference between implemented
and expected behavior?

Regards,
Martin

From greg.ewing at canterbury.ac.nz  Mon Nov  6 00:21:26 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Nov 2006 12:21:26 +1300
Subject: [Python-Dev] [Python-3000] Mini Path object
In-Reply-To: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com>
References: <6e9196d20611012247w51d740fm68116bd98b6591d9@mail.gmail.com>
Message-ID: <454E71F6.7090103@canterbury.ac.nz>

Mike Orr wrote:

>     .abspath()
>     .normpath()
>     .realpath()
>     .splitpath()
>     .relpath()
>     .relpathto()

Seeing as the whole class is about paths, having
"path" in the method names seems redundant. I'd
prefer to see terser method names without any
noise characters in them.

--
Greg

From greg.ewing at canterbury.ac.nz  Mon Nov  6 00:34:05 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Nov 2006 12:34:05 +1300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454D5703.5070509@v.loewis.de>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
Message-ID: <454E74ED.8070706@canterbury.ac.nz>

Martin v. L?wis wrote:

> That should never be better: the system will cache the directory
> blocks, also, and it will do a better job than Python will.

If that's really the case, then why do discussions
of how improve Python startup speeds seem to focus
on the number of stat calls made?

Also, cacheing isn't the only thing to consider.
Last time I looked at the implementation of unix
file systems, they mostly seemed to do directory
lookups by linear search. Unless that's changed
a lot, I have a hard time seeing how that's
going to beat Python's highly-tuned dictionaries.

--
Greg

From dalke at dalkescientific.com  Mon Nov  6 00:43:42 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Nov 2006 00:43:42 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <454E6E5F.7070800@v.loewis.de>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>
	<eih55s$7u0$1@sea.gmane.org>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
	<454E39E5.8040604@v.loewis.de>
	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>
	<454E6E5F.7070800@v.loewis.de>
Message-ID: <d78db4cd0611051543g38eb0004w9cc4062c32c050f8@mail.gmail.com>

Me [Andrew]:
> > As this is not a bug, I have added the feature request 1591035 to SF
> > titled "update urlparse to RFC 3986".  Nothing else appeared to exist
> > on that specific topic.

Martin:
> Thanks. It always helps to be more specific; being less specific often
> hurts.

So does being more specific.  I wasn't trying to report a bug in
urlparse.  I figured everyone knew the problems existed.  The code
comments say so and various back discussions on this list say so.

All I wanted to do what point out that two seemingly similar problems -
path traversal of hierarchical structures - had two different expected
behaviors.  Now I've spent entirely too much time on specifics I didn't
care about and didn't think were important.

I've also been known to do the full report and have people ignore what
I wrote because it was too long.

> I find there is a difference between "urllib behaves
> non-intuitively" and "urllib gives result A for parameters B and C,
> but should give result D instead". Can you please add specific examples
> to your report that demonstrate the difference between implemented
> and expected behavior?

No.

I consider the "../" cases to be unimportant edge cases and
I would rather people fixed the other problems highlighted in the
text I copied from 4Suite's Uri.py -- like improperly allowing a
relative URL as the base url, which I incorrectly assumed was
legit - and that others have reported on python-dev, easily found
with Google.

If I only add test cases for "../" then I believe that that's all that
will be fixed.

Given the back history of this problem and lack of followup I
also believe it won't be fixed unless someone develops a brand
new module, from scratch, which will be added to some future
Python version.  There's probably a compliance suite out there
to use for this sort of task.  I hadn't bothered to look as I am
no more proficient than others here at Google.

Finally, I see that my report is a dup.  SF search is poor.  As
Nick Coghlan reported, Paul Jimenez has a replacement for urlparse.
Summarized in
 http://www.python.org/dev/summary/2006-04-01_2006-04-15/
It was submitted in spring as a patch - SF# 1462525 at
  http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470
which I didn't find in my earlier searching.

                Andrew
                dalke at dalkescientific.com

From greg.ewing at canterbury.ac.nz  Mon Nov  6 01:07:55 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Mon, 06 Nov 2006 13:07:55 +1300
Subject: [Python-Dev] idea for data-type (data-format) PEP
In-Reply-To: <45491492.9060208@ee.byu.edu>
References: <eiaipj$l5s$1@sea.gmane.org> <4548DDFD.5030604@v.loewis.de>
	<eiaovn$dar$1@sea.gmane.org> <4548FA58.4050702@v.loewis.de>
	<4549010F.6090200@ieee.org> <45490989.9010603@v.loewis.de>
	<45491492.9060208@ee.byu.edu>
Message-ID: <454E7CDB.2000402@canterbury.ac.nz>

Travis Oliphant wrote:

> In NumPy, the data-type objects have function pointers to accomplish all 
> the things NumPy does quickly.

If the datatype object is to be extracted and made a
stand-alone feature, that might need to be refactored.

Perhaps there could be a facility for traversing a
datatype with a user-supplied dispatch table?

--
Greg

From foom at fuhm.net  Mon Nov  6 04:08:42 2006
From: foom at fuhm.net (James Y Knight)
Date: Sun, 5 Nov 2006 22:08:42 -0500
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create
	separate GIL (branch)
In-Reply-To: <454C5431.7080609@v.loewis.de>
References: <454B1EDD.9050908@googlemail.com> <454C5431.7080609@v.loewis.de>
Message-ID: <B09F3817-177A-4990-ADD3-4B0720E98DD0@fuhm.net>


On Nov 4, 2006, at 3:49 AM, Martin v. L?wis wrote:

> Notice that at least the following objects are shared between
> interpreters, as they are singletons:
> - None, True, False, (), "", u""
> - strings of length 1, Unicode strings of length 1 with ord < 256
> - integers between -5 and 256
> How do you deal with the reference counters of these objects?
>
> Also, type objects (in particular exception types) are shared between
> interpreters. These are mutable objects, so you have actually
> dictionaries shared between interpreters. How would you deal with  
> these?

All these should be dealt with by making them per-interpreter  
singletons, not per address space. That should be simple enough,  
unfortunately the margins of this email are too small to describe  
how. ;) Also it'd be backwards incompatible with current extension  
modules.

James

From guido at python.org  Mon Nov  6 05:52:46 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 5 Nov 2006 20:52:46 -0800
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create
	separate GIL (branch)
In-Reply-To: <B09F3817-177A-4990-ADD3-4B0720E98DD0@fuhm.net>
References: <454B1EDD.9050908@googlemail.com> <454C5431.7080609@v.loewis.de>
	<B09F3817-177A-4990-ADD3-4B0720E98DD0@fuhm.net>
Message-ID: <ca471dc20611052052s2cfe3461l7265b7a2aeae5b3@mail.gmail.com>

On 11/5/06, James Y Knight <foom at fuhm.net> wrote:
>
> On Nov 4, 2006, at 3:49 AM, Martin v. L?wis wrote:
>
> > Notice that at least the following objects are shared between
> > interpreters, as they are singletons:
> > - None, True, False, (), "", u""
> > - strings of length 1, Unicode strings of length 1 with ord < 256
> > - integers between -5 and 256
> > How do you deal with the reference counters of these objects?
> >
> > Also, type objects (in particular exception types) are shared between
> > interpreters. These are mutable objects, so you have actually
> > dictionaries shared between interpreters. How would you deal with
> > these?
>
> All these should be dealt with by making them per-interpreter
> singletons, not per address space. That should be simple enough,
> unfortunately the margins of this email are too small to describe
> how. ;) Also it'd be backwards incompatible with current extension
> modules.

I don't know how you define simple. In order to be able to have
separate GILs  you have to remove *all* sharing of objects between
interpreters. And all other data structures, too. It would probably
kill performance too, because currently obmalloc relies on the GIL.

So I don't see much point in continuing this thread.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From talin at acm.org  Mon Nov  6 07:27:52 2006
From: talin at acm.org (Talin)
Date: Sun, 05 Nov 2006 22:27:52 -0800
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to
 create	separate GIL (branch)
In-Reply-To: <ca471dc20611052052s2cfe3461l7265b7a2aeae5b3@mail.gmail.com>
References: <454B1EDD.9050908@googlemail.com>
	<454C5431.7080609@v.loewis.de>	<B09F3817-177A-4990-ADD3-4B0720E98DD0@fuhm.net>
	<ca471dc20611052052s2cfe3461l7265b7a2aeae5b3@mail.gmail.com>
Message-ID: <454ED5E8.4010709@acm.org>

Guido van Rossum wrote:
> I don't know how you define simple. In order to be able to have
> separate GILs  you have to remove *all* sharing of objects between
> interpreters. And all other data structures, too. It would probably
> kill performance too, because currently obmalloc relies on the GIL.

Nitpick: You have to remove all sharing of *mutable* objects. One day, 
when we get "pure" GC with no refcounting, that will be a meaningful 
distinction. :)

-- Talin

From martin at v.loewis.de  Mon Nov  6 07:49:14 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Nov 2006 07:49:14 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454E74ED.8070706@canterbury.ac.nz>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
	<454E74ED.8070706@canterbury.ac.nz>
Message-ID: <454EDAEA.7050501@v.loewis.de>

Greg Ewing schrieb:
>> That should never be better: the system will cache the directory
>> blocks, also, and it will do a better job than Python will.
> 
> If that's really the case, then why do discussions
> of how improve Python startup speeds seem to focus
> on the number of stat calls made?

A stat call will not only look at the directory entry, but also
look at the inode. This will require another disk access, as the
inode is at a different location of the disk.

> Also, cacheing isn't the only thing to consider.
> Last time I looked at the implementation of unix
> file systems, they mostly seemed to do directory
> lookups by linear search. Unless that's changed
> a lot, I have a hard time seeing how that's
> going to beat Python's highly-tuned dictionaries.

It depends on the file system you are using. An NTFS directory
lookup is a B-Tree search; NT has not been doing linear search
since its introduction 15 years ago. Linux only recently started
doing tree-based directories with the introduction of ext4.
However, Linux' in-memory directory cache (the dcache) doesn't
need to scan over the directory block structure; not sure whether
it uses linear search still.

For a small directory, the difference is likely negligible. For
a large directory, the cost of reading in the entire directory
might be higher than the savings gained from not having to
search it. Also, if we do our own directory caching, the question
is when to invalidate the cache.

Regards,
Martin

From martin at v.loewis.de  Mon Nov  6 08:03:45 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Nov 2006 08:03:45 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611051543g38eb0004w9cc4062c32c050f8@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>	<eih55s$7u0$1@sea.gmane.org>	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>	<454CCB03.1030806@holdenweb.com>
	<eiimd5$17a$1@sea.gmane.org>	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>	<454E39E5.8040604@v.loewis.de>	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>	<454E6E5F.7070800@v.loewis.de>
	<d78db4cd0611051543g38eb0004w9cc4062c32c050f8@mail.gmail.com>
Message-ID: <454EDE51.7000307@v.loewis.de>

Andrew Dalke schrieb:
>> I find there is a difference between "urllib behaves
>> non-intuitively" and "urllib gives result A for parameters B and C,
>> but should give result D instead". Can you please add specific examples
>> to your report that demonstrate the difference between implemented
>> and expected behavior?
> 
> No.
> 
> I consider the "../" cases to be unimportant edge cases and
> I would rather people fixed the other problems highlighted in the
> text I copied from 4Suite's Uri.py -- like improperly allowing a
> relative URL as the base url, which I incorrectly assumed was
> legit - and that others have reported on python-dev, easily found
> with Google.

It still should be possible to come up with examples for these as
well, no? For example, if you pass a relative URI as the base
URI, what would you like to see happen?

> If I only add test cases for "../" then I believe that that's all that
> will be fixed.

That's true. Actually, it's probably not true; it will only get fixed
if some volunteer contributes a fix.

> Finally, I see that my report is a dup.  SF search is poor.  As
> Nick Coghlan reported, Paul Jimenez has a replacement for urlparse.
> Summarized in
>  http://www.python.org/dev/summary/2006-04-01_2006-04-15/
> It was submitted in spring as a patch - SF# 1462525 at
>   http://sourceforge.net/tracker/index.php?func=detail&aid=1462525&group_id=5470&atid=305470
> which I didn't find in my earlier searching.

So do you think this patch meets your requirements?

This topic (URL parsing) is not only inherently difficult to
implement, it is just as tedious to review. Without anybody
reviewing the contributed code, it's certain that it will never
be incorporated.

Regards,
Martin

From dalke at dalkescientific.com  Mon Nov  6 12:04:28 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Nov 2006 12:04:28 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <454EDE51.7000307@v.loewis.de>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
	<454E39E5.8040604@v.loewis.de>
	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>
	<454E6E5F.7070800@v.loewis.de>
	<d78db4cd0611051543g38eb0004w9cc4062c32c050f8@mail.gmail.com>
	<454EDE51.7000307@v.loewis.de>
Message-ID: <d78db4cd0611060304oc5f4508p399510b46ee9a46d@mail.gmail.com>

Martin:
> It still should be possible to come up with examples for these as
> well, no? For example, if you pass a relative URI as the base
> URI, what would you like to see happen?

Until two days ago I didn't even realize that was an incorrect
use of urljoin.  I can't be the only one.  Hence, raise an
exception - just like 4Suite's Uri.py does.

> That's true. Actually, it's probably not true; it will only get fixed
> if some volunteer contributes a fix.

And it's not I.  A true fix is a lot of work.  I would rather use Uri.py,
now that I see it handles everything I care about, and then some.
Eg, file name <-> URI conversion.

> So do you think this patch meets your requirements?

# new
>>> uriparse.urljoin("http://spam/", "foo/bar")
'http://spam//foo/bar'
>>>

# existing
>>> urlparse.urljoin("http://spam/", "foo/bar")
'http://spam/foo/bar'
>>>

No.  That was the first thing I tried.  Also found

>>> urlparse.urljoin("http://blah", "/spam/")
'http://blah/spam/'
>>> uriparse.urljoin("http://blah", "/spam/")
'http://blah/spam'
>>>

I reported these on the  patch page.  Nothing else strange
came up, but I did only try http urls and not the others.

My "requirements", meaning my vague, spur-of-the-moment thoughts
without any research or experimentation to determing their validity,
are different than those for Python.

My real requirements are met by the existing code.

My imagined ones include support for edge cases, the idna
codec, unicode, and real-world use on a variety of OSes.

4Suite's Uri.py seems to have this.  Eg, lots of edge-case
code like

    # On Windows, ensure that '|', not ':', is used in a drivespec.
    if os.name == 'nt' and scheme == 'file':
        path = path.replace(':','|',1)

Hence the uriparse.py patch does not meet my hypothetical
requirements .

Python's requirements are probably to get closer to the spec.
In which case yes, it's at least as good as and likely generally
better than the existing module, modulo a few API naming debates
and perhaps some rough edges which will be found when put into use.

And perhaps various arguments about how bug compatible it should be
and if the old code should be available as well as the new one,
for those who depend on the existing 1808-allowed implementation
dependent behavior.

For those I have not the experience to guide me and no care to push
the debate.  I've decided I'm going to experiment using 4Suite's Uri.py
for my code because it handles things I want which are outside of
the scope of uriparse.py

> This topic (URL parsing) is not only inherently difficult to
> implement, it is just as tedious to review. Without anybody
> reviewing the contributed code, it's certain that it will never
> be incorporated.

I have a different opinion.

Python's url manipulation code is a mess.  urlparse, urllib, urllib2.
Why is "urlencode" part of urllib and not urllib2?  For that matter,
urllib is labeled 'Open an arbitrary URL' and not 'and also do
manipulations on parts of URLs."

I don't want to start fixing code because doing it the way I want to
requires a new API and a much better understanding of the RFCs
than I care about, especially since 4Suite and others have already
done this.

Hence I would say to just grab their library.  And perhaps update the
naming scheme.

Also, urlgrabber and pycURL are better for downloading arbitrary
URIs.  For some definitions of "better".

                Andrew
                dalke at dalkescientific.com

From tds333+pydev at gmail.com  Mon Nov  6 13:18:37 2006
From: tds333+pydev at gmail.com (Wolfgang Langner)
Date: Mon, 6 Nov 2006 13:18:37 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454CB619.7010804@v.loewis.de>
References: <454CB619.7010804@v.loewis.de>
Message-ID: <4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com>

Why not only import *.pyc files and no longer use *.pyo files.

It is simpler to have one compiled python file extension.
PYC files can contain optimized python byte code and normal byte code.


-- 
bye by Wolfgang

From arigo at tunes.org  Mon Nov  6 14:57:51 2006
From: arigo at tunes.org (Armin Rigo)
Date: Mon, 6 Nov 2006 14:57:51 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454CB619.7010804@v.loewis.de>
References: <454CB619.7010804@v.loewis.de>
Message-ID: <20061106135751.GA29592@code0.codespeak.net>

Hi Martin,

On Sat, Nov 04, 2006 at 04:47:37PM +0100, "Martin v. L?wis" wrote:
> Patch #1346572 proposes to also search for .pyc when OptimizeFlag
> is set, and for .pyo when it is not set. The author argues this is
> for consistency, as the zipimporter already does that.

My strong opinion on the matter is that importing a .pyc file if the .py
file is not present is wrong in the first place.  It caused many
headaches in several projects I worked on.  Additionally trying to
importing .pyo files looks like a complete semantic non-sense, but I
can't really argue from experience, as I never run python -O.

Typical example: someone in the project removes a .py file, and checks
in this change; someone else does an 'svn up', which kills the .py in
his working copy, but not the .pyc.  These stale .pyc's cause pain, e.g.
by shadowing the real module (further down sys.path), or simply by
preventing the project's developers from realizing that they forgot to
fix some imports.  We regularly had obscure problems that went away as
soon as we deleted all .pyc files around, but I cannot comment more on
that because we never really investigated.  

I know it's a discussion that comes up and dies out regularly.  My two
cents is that it would be saner to have two separate concepts: cache
files used internally by the interpreter for speed reasons only, and
bytecode files that can be shipped and imported.  This could e.g. be
done with different file extensions (then you just rename the files if
you want to ship them as bytecode without source), or with a temporary
cache directory (from where you can fish bytecode files if you want to
ship them).  Experience suggests I should not be holding my breath until
something is decided about this, though.  If I were asked to come up
with a patch I'd simply propose one that removes importing of stale .pyc
files (I'm always running a version of Python with such a patch, to
avoid the above-mentioned troubles).


A bientot,

Armin

From fredrik at pythonware.com  Mon Nov  6 14:59:06 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 06 Nov 2006 14:59:06 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <454E6E5F.7070800@v.loewis.de>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>		<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>		<454BD1A9.8080508@v.loewis.de>		<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>		<eih55s$7u0$1@sea.gmane.org>		<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>		<454CCB03.1030806@holdenweb.com>
	<eiimd5$17a$1@sea.gmane.org>		<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>		<454E39E5.8040604@v.loewis.de>	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>
	<454E6E5F.7070800@v.loewis.de>
Message-ID: <einf3a$865$1@sea.gmane.org>

Martin v. L?wis wrote:

> Andrew Dalke schrieb:
>>>>> urlparse.urljoin("http://blah.com/", "..")
>> 'http://blah.com/'
>>>>> urlparse.urljoin("http://blah.com/", "../")
>> 'http://blah.com/../'
>>>>> urlparse.urljoin("http://blah.com/", "../..")
>> 'http://blah.com/'
>>
>> Does the result make sense to you?  Does it make
>> sense that the last of these is shorter than the middle
>> one?  It sure doesn't to me.  I thought it was obvious
>> that there was an error;
> 
> That wasn't obvious at all to me. Now looking at the
> examples, I agree there is an error. The middle one
> is incorrect;
> 
> urlparse.urljoin("http://blah.com/", "../")
> 
> should also give 'http://blah.com/'.

make that: could also give 'http://blah.com/'.

as I said, today's urljoin doesn't guarantee that the output is
the *shortest* possible way to represent the resulting URI.

</F>


From dalke at dalkescientific.com  Mon Nov  6 15:57:59 2006
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Mon, 6 Nov 2006 15:57:59 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <einf3a$865$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>
	<eih55s$7u0$1@sea.gmane.org>
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>
	<454E39E5.8040604@v.loewis.de>
	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>
	<454E6E5F.7070800@v.loewis.de> <einf3a$865$1@sea.gmane.org>
Message-ID: <d78db4cd0611060657t86e4ad0q6fb4e79494ea14fc@mail.gmail.com>

Andrew:
> >>> urlparse.urljoin("http://blah.com/", "..")
> 'http://blah.com/'
> >>> urlparse.urljoin("http://blah.com/", "../")
> 'http://blah.com/../'
> >>> urlparse.urljoin("http://blah.com/", "../..")
> 'http://blah.com/'

/F:
> as I said, today's urljoin doesn't guarantee that the output is
> the *shortest* possible way to represent the resulting URI.

I didn't think anyone was making that claim.  The module claims
RFC 1808 compliance.  From the docstring:

    DESCRIPTION
        See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
        UC Irvine, June 1995.

Now quoting from RFC 1808:

   5.2.  Abnormal Examples

   Although the following abnormal examples are unlikely to occur in
   normal practice, all URL parsers should be capable of resolving them
   consistently.  Each example uses the same base as above.

   An empty reference resolves to the complete base URL:

      <>            = <URL:http://a/b/c/d;p?q#f>

   Parsers must be careful in handling the case where there are more
   relative path ".." segments than there are hierarchical levels in the
   base URL's path.

My claim is that "consistent" implies "in the spirit of the rest of the RFC"
and "to a human trying to make sense of the results" and not only
mean "does the same thing each time."  Else

>>> urljoin("http://blah.com/", "../../..")
'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url'

would be equally consistent.

>>> for rel in ".. ../ ../.. ../../ ../../.. ../../../ ../../../..".split():
...   print repr(rel), repr(urlparse.urljoin("http://blah.com/", rel))
...
'..' 'http://blah.com/'
'../' 'http://blah.com/../'
'../..' 'http://blah.com/'
'../../' 'http://blah.com/../../'
'../../..' 'http://blah.com/../'
'../../../' 'http://blah.com/../../../'
'../../../..' 'http://blah.com/../../'

I grant there is a consistency there.  It's not one most would have
predicted beforehand.

Then again, "should" is that wishy-washy "unless you've got a good
reason to do it a different way" sort of constraint.

        Andrew
        dalke at dalkescientific.com

From tomerfiliba at gmail.com  Mon Nov  6 16:02:51 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 6 Nov 2006 17:02:51 +0200
Subject: [Python-Dev] __dir__, part 2
Message-ID: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>

so, if you remember, i suggested adding __dir__ to objects, so as to make
dir() customizable, remove the deprecated __methods__ and __members__,
and make it symmetrical to other built-in functions.

you can see the original post here:
http://mail.python.org/pipermail/python-dev/2006-July/067095.html
which was generally accepted by the forum:
http://mail.python.org/pipermail/python-dev/2006-July/067139.html

so i went on, now that i have some spare time, to research the issue.
the current dir() works as follows:
(*) builtin_dir calls PyObject_Dir to do the trick
(*) if the object is NULL (dir with no argument), return the frame's locals
(*) if the object is a *module*, we're just using it's __dict__
(*) if the object is a *type*, we're using it's __dict__ and __bases__,
but not __class__ (so as not to show the metaclass)
(*) otherwise, it's a "normal object", so we take it's __dict__, along with
__methods__, __members__, and dir(__class__)
(*) create a list of keys from the dict, sort, return

we'll have to change that if we were to introduce __dir__. my design is:
(*) builtin_dir, if called without an argument, returns the frame's locals
(*) otherwise, it calls PyObject_Dir(self), which would dispatch self.__dir__()
(*) if `self` doesn't have __dir__, default to object.__dir__(self)
(*) the default object.__dir__ implementation would do the same as
today: collect __dict__, __members__, __methods__, and dir(__class__).
by py3k, we'll remove looking into __methods__ and __members__.
(*) type objects and module objects would implement __dir__ to their
liking (as PyObject_Dir does today)
(*) builtin_dir would take care of sorting the list returned by PyObject_Dir

so first i'd want you people to react on my design, maybe you'd find
flaws whatever. also, should this become a PEP?

and last, how do i add a new method slot? does it mean i need to
change all type-object definitions throughout the codebase?
do i add it to some protocol? or directly to the "object protocol"?


-tomer

From jcarlson at uci.edu  Mon Nov  6 16:53:45 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 06 Nov 2006 07:53:45 -0800
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to
	create	separate GIL (branch)
In-Reply-To: <454ED5E8.4010709@acm.org>
References: <ca471dc20611052052s2cfe3461l7265b7a2aeae5b3@mail.gmail.com>
	<454ED5E8.4010709@acm.org>
Message-ID: <20061106075222.8221.JCARLSON@uci.edu>


Talin <talin at acm.org> wrote:
> 
> Guido van Rossum wrote:
> > I don't know how you define simple. In order to be able to have
> > separate GILs  you have to remove *all* sharing of objects between
> > interpreters. And all other data structures, too. It would probably
> > kill performance too, because currently obmalloc relies on the GIL.
> 
> Nitpick: You have to remove all sharing of *mutable* objects. One day, 
> when we get "pure" GC with no refcounting, that will be a meaningful 
> distinction. :)

Python already grew that feature a couple years back, but it never
became mainline. Search google (I don't know the magic incantation off
the top of my head), buf if I remember correctly, it wasn't a
significant win if any at all.


 - Josiah


From guido at python.org  Mon Nov  6 17:07:07 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 6 Nov 2006 08:07:07 -0800
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
Message-ID: <ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>

Sounds like a good plan, though I'm not sure if it's worth doing in
2.6 -- I'd be happy with doing this just in 3k.

I'm not sure what you mean by "adding a method slot" -- certainly it's
possible to define a method __foo__ and call it directly without
having a special tp_foo in the type object, and I recommend doing it
that way since the tp_foo slots are just there to make things fast; in
this case I don't see a need for dir() to be fast.

--Guido

On 11/6/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> so, if you remember, i suggested adding __dir__ to objects, so as to make
> dir() customizable, remove the deprecated __methods__ and __members__,
> and make it symmetrical to other built-in functions.
>
> you can see the original post here:
> http://mail.python.org/pipermail/python-dev/2006-July/067095.html
> which was generally accepted by the forum:
> http://mail.python.org/pipermail/python-dev/2006-July/067139.html
>
> so i went on, now that i have some spare time, to research the issue.
> the current dir() works as follows:
> (*) builtin_dir calls PyObject_Dir to do the trick
> (*) if the object is NULL (dir with no argument), return the frame's locals
> (*) if the object is a *module*, we're just using it's __dict__
> (*) if the object is a *type*, we're using it's __dict__ and __bases__,
> but not __class__ (so as not to show the metaclass)
> (*) otherwise, it's a "normal object", so we take it's __dict__, along with
> __methods__, __members__, and dir(__class__)
> (*) create a list of keys from the dict, sort, return
>
> we'll have to change that if we were to introduce __dir__. my design is:
> (*) builtin_dir, if called without an argument, returns the frame's locals
> (*) otherwise, it calls PyObject_Dir(self), which would dispatch self.__dir__()
> (*) if `self` doesn't have __dir__, default to object.__dir__(self)
> (*) the default object.__dir__ implementation would do the same as
> today: collect __dict__, __members__, __methods__, and dir(__class__).
> by py3k, we'll remove looking into __methods__ and __members__.
> (*) type objects and module objects would implement __dir__ to their
> liking (as PyObject_Dir does today)
> (*) builtin_dir would take care of sorting the list returned by PyObject_Dir
>
> so first i'd want you people to react on my design, maybe you'd find
> flaws whatever. also, should this become a PEP?
>
> and last, how do i add a new method slot? does it mean i need to
> change all type-object definitions throughout the codebase?
> do i add it to some protocol? or directly to the "object protocol"?
>
>
> -tomer
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fredrik at pythonware.com  Mon Nov  6 16:48:45 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 06 Nov 2006 16:48:45 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611060657t86e4ad0q6fb4e79494ea14fc@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<eih55s$7u0$1@sea.gmane.org>	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>	<454CCB03.1030806@holdenweb.com>
	<eiimd5$17a$1@sea.gmane.org>	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>	<454E39E5.8040604@v.loewis.de>	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>	<454E6E5F.7070800@v.loewis.de>
	<einf3a$865$1@sea.gmane.org>
	<d78db4cd0611060657t86e4ad0q6fb4e79494ea14fc@mail.gmail.com>
Message-ID: <einlgt$25u$1@sea.gmane.org>

Andrew Dalke wrote:

>> as I said, today's urljoin doesn't guarantee that the output is
>> the *shortest* possible way to represent the resulting URI.
> 
> I didn't think anyone was making that claim.  The module claims
> RFC 1808 compliance.  From the docstring:
> 
>     DESCRIPTION
>         See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
>         UC Irvine, June 1995.
> 
> Now quoting from RFC 1808:
> 
>    5.2.  Abnormal Examples
> 
>    Although the following abnormal examples are unlikely to occur in
>    normal practice, all URL parsers should be capable of resolving them
>    consistently.

> My claim is that "consistent" implies "in the spirit of the rest of the RFC"
> and "to a human trying to make sense of the results" and not only
> mean "does the same thing each time."  Else
> 
>>>> urljoin("http://blah.com/", "../../..")
> 'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url'
> 
> would be equally consistent.

perhaps, but such an urljoin wouldn't pass the

     minimize(base + relative) == minimize(urljoin(base, relative))

test that today's urljoin passes (where "minimize" is defined as "create 
the shortest possible URI that identifies the same target, according to 
the relevant RFC").

isn't the real issue in this subthread whether urljoin should be 
expected to pass the

     minimize(base + relative) == urljoin(base, relative)

test?

</F>


From jcarlson at uci.edu  Mon Nov  6 17:36:14 2006
From: jcarlson at uci.edu (Josiah Carlson)
Date: Mon, 06 Nov 2006 08:36:14 -0800
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <20061106135751.GA29592@code0.codespeak.net>
References: <454CB619.7010804@v.loewis.de>
	<20061106135751.GA29592@code0.codespeak.net>
Message-ID: <20061106082638.822F.JCARLSON@uci.edu>


Armin Rigo <arigo at tunes.org> wrote:
> Hi Martin,
> On Sat, Nov 04, 2006 at 04:47:37PM +0100, "Martin v. L?wis" wrote:
> > Patch #1346572 proposes to also search for .pyc when OptimizeFlag
> > is set, and for .pyo when it is not set. The author argues this is
> > for consistency, as the zipimporter already does that.
> 
> My strong opinion on the matter is that importing a .pyc file if the .py
> file is not present is wrong in the first place.  It caused many
> headaches in several projects I worked on.
> 
> Typical example: someone in the project removes a .py file, and checks
> in this change; someone else does an 'svn up', which kills the .py in
> his working copy, but not the .pyc.  These stale .pyc's cause pain, e.g.
> by shadowing the real module (further down sys.path), or simply by
> preventing the project's developers from realizing that they forgot to
> fix some imports.  We regularly had obscure problems that went away as
> soon as we deleted all .pyc files around, but I cannot comment more on
> that because we never really investigated.  

I had a very similar problem the other week when mucking about with a
patch to ntpath .  I had it in a somewhat small temporary projects
folder and needed to run another project.  It picked up the local
ntpath.py when importing path.py, but then failed because I was working
on a 2.5 derived ntpath, but I was using 2.3 to run the other project. 
After renaming the local ntpath, I continued to get the error until I
realized "damn pyc" and was halfway through a filsystem wide search for
the problem code (10 minutes elapsed).

About the only place where I have found the need for pyc-without-py
importing is for zipimports, specifically as used by py2exe and other
freezing applications.  I don't know if we want to add a new command
line option, or a __future__ import, or something, but I think there
should be some method of warning people that an import was performed
without source code.


 - Josiah


From martin at v.loewis.de  Mon Nov  6 18:55:04 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Nov 2006 18:55:04 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <d78db4cd0611060304oc5f4508p399510b46ee9a46d@mail.gmail.com>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	
	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>	
	<454CCB03.1030806@holdenweb.com> <eiimd5$17a$1@sea.gmane.org>	
	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>	
	<454E39E5.8040604@v.loewis.de>	
	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>	
	<454E6E5F.7070800@v.loewis.de>	
	<d78db4cd0611051543g38eb0004w9cc4062c32c050f8@mail.gmail.com>	
	<454EDE51.7000307@v.loewis.de>
	<d78db4cd0611060304oc5f4508p399510b46ee9a46d@mail.gmail.com>
Message-ID: <454F76F8.5090805@v.loewis.de>

Andrew Dalke schrieb:
> Hence I would say to just grab their library.  And perhaps update the
> naming scheme.

Unfortunately, this is not an option. *You* can just grab their library;
the Python distribution can't. Doing so would mean to fork, and history
tells that forks cause problems in the long run. OTOH, if the 4Suite
people would contribute the library, integrating it would be an option.

Regards,
Martin

From martin at v.loewis.de  Mon Nov  6 19:00:22 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Nov 2006 19:00:22 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <20061106135751.GA29592@code0.codespeak.net>
References: <454CB619.7010804@v.loewis.de>
	<20061106135751.GA29592@code0.codespeak.net>
Message-ID: <454F7836.9030807@v.loewis.de>

Armin Rigo schrieb:
> My strong opinion on the matter is that importing a .pyc file if the .py
> file is not present is wrong in the first place.

There is, of course, an important use case (which you are addressing
with a different approach): people want to ship only byte code, not
source code, because they feel it protects their IP better, and also
for space reasons. So outright ignoring pyc files is not really an
option.

> I know it's a discussion that comes up and dies out regularly.  My two
> cents is that it would be saner to have two separate concepts: cache
> files used internally by the interpreter for speed reasons only, and
> bytecode files that can be shipped and imported.

There once was a PEP to better control byte code file generation; it
died because it wasn't implemented. I don't think there is a strong
opposition to changing the status quo - it's just that you need a
well-designed specification before you start, a serious,
all-singing-all-dancing implementation, and a lot of test cases.
I believe it is these constraints which have prevented any progress
here.

Regards,
Martin

From martin at v.loewis.de  Mon Nov  6 19:02:13 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Nov 2006 19:02:13 +0100
Subject: [Python-Dev] Path object design
In-Reply-To: <einf3a$865$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>		<d78db4cd0611030858n77c0f17aid5ec9ef0998452c8@mail.gmail.com>		<454BD1A9.8080508@v.loewis.de>		<5.1.1.6.0.20061103205115.0276da50@sparrow.telecommunity.com>		<eih55s$7u0$1@sea.gmane.org>		<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>		<454CCB03.1030806@holdenweb.com>	<eiimd5$17a$1@sea.gmane.org>		<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>		<454E39E5.8040604@v.loewis.de>	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>	<454E6E5F.7070800@v.loewis.de>
	<einf3a$865$1@sea.gmane.org>
Message-ID: <454F78A5.6000400@v.loewis.de>

Fredrik Lundh schrieb:
>> urlparse.urljoin("http://blah.com/", "../")
>>
>> should also give 'http://blah.com/'.
> 
> make that: could also give 'http://blah.com/'.

How so? If that would implement RFC 3986, you can
get only a single outcome, if urljoin is meant
to implement section 5 of that RFC.

Regards,
Martin

From martin at v.loewis.de  Mon Nov  6 19:03:57 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 06 Nov 2006 19:03:57 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com>
References: <454CB619.7010804@v.loewis.de>
	<4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com>
Message-ID: <454F790D.6090508@v.loewis.de>

Wolfgang Langner schrieb:
> Why not only import *.pyc files and no longer use *.pyo files.
> 
> It is simpler to have one compiled python file extension.
> PYC files can contain optimized python byte code and normal byte code.

So what would you do with the -O option of the interpreter?

Regards,
Martin

From rasky at develer.com  Mon Nov  6 20:54:35 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Mon, 6 Nov 2006 20:54:35 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
References: <454CB619.7010804@v.loewis.de>
	<20061106135751.GA29592@code0.codespeak.net>
Message-ID: <06d201c701dd$62aa8110$c003030a@trilan>

Armin Rigo wrote:

> Typical example: someone in the project removes a .py file, and checks
> in this change; someone else does an 'svn up', which kills the .py in
> his working copy, but not the .pyc.  These stale .pyc's cause pain,
> e.g.
> by shadowing the real module (further down sys.path), or simply by
> preventing the project's developers from realizing that they forgot to
> fix some imports.  We regularly had obscure problems that went away as
> soon as we deleted all .pyc files around, but I cannot comment more on
> that because we never really investigated.

This is exactly why I always use this module:

================== nobarepyc.py ============================
#!/usr/bin/env python
#-*- coding: utf-8 -*-
import ihooks
import os

class _NoBarePycHooks(ihooks.Hooks):
    def load_compiled(self, name, filename, *args, **kwargs):
        sourcefn = os.path.splitext(filename)[0] + ".py"
        if not os.path.isfile(sourcefn):
            raise ImportError('forbidden import of bare .pyc file: %r' %
filename)
        return ihooks.Hooks.load_compiled(name, filename, *args, **kwargs)

ihooks.ModuleImporter(ihooks.ModuleLoader(_NoBarePycHooks())).install()
================== /nobarepyc.py ============================

Just import it before importing anything else (or in site.py if you prefer)
and you'll be done.

Ah, it doesn't work with zipimports...
-- 
Giovanni Bajo


From steve at holdenweb.com  Mon Nov  6 20:48:55 2006
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 06 Nov 2006 14:48:55 -0500
Subject: [Python-Dev] Path object design
In-Reply-To: <einlgt$25u$1@sea.gmane.org>
References: <20061101215724.14394.1801823509.divmod.xquotient.351@joule.divmod.com>	<eih55s$7u0$1@sea.gmane.org>	<dcbbbb410611040809r2255d8c8qba5e9c2af3222a91@mail.gmail.com>	<454CCB03.1030806@holdenweb.com>	<eiimd5$17a$1@sea.gmane.org>	<d78db4cd0611050323k1485c874k714f4681849b61ea@mail.gmail.com>	<454E39E5.8040604@v.loewis.de>	<d78db4cd0611051429w5c06b214h369de4ea2e550f8e@mail.gmail.com>	<454E6E5F.7070800@v.loewis.de>	<einf3a$865$1@sea.gmane.org>	<d78db4cd0611060657t86e4ad0q6fb4e79494ea14fc@mail.gmail.com>
	<einlgt$25u$1@sea.gmane.org>
Message-ID: <eio3gv$pkb$2@sea.gmane.org>

Fredrik Lundh wrote:
> Andrew Dalke wrote:
> 
> 
>>>as I said, today's urljoin doesn't guarantee that the output is
>>>the *shortest* possible way to represent the resulting URI.
>>
>>I didn't think anyone was making that claim.  The module claims
>>RFC 1808 compliance.  From the docstring:
>>
>>    DESCRIPTION
>>        See RFC 1808: "Relative Uniform Resource Locators", by R. Fielding,
>>        UC Irvine, June 1995.
>>
>>Now quoting from RFC 1808:
>>
>>   5.2.  Abnormal Examples
>>
>>   Although the following abnormal examples are unlikely to occur in
>>   normal practice, all URL parsers should be capable of resolving them
>>   consistently.
> 
> 
>>My claim is that "consistent" implies "in the spirit of the rest of the RFC"
>>and "to a human trying to make sense of the results" and not only
>>mean "does the same thing each time."  Else
>>
>>
>>>>>urljoin("http://blah.com/", "../../..")
>>
>>'http://blah.com/there/were/too/many/dot-dot/path/elements/in/the/relative/url'
>>
>>would be equally consistent.
> 
> 
> perhaps, but such an urljoin wouldn't pass the
> 
>      minimize(base + relative) == minimize(urljoin(base, relative))
> 
> test that today's urljoin passes (where "minimize" is defined as "create 
> the shortest possible URI that identifies the same target, according to 
> the relevant RFC").
> 
> isn't the real issue in this subthread whether urljoin should be 
> expected to pass the
> 
>      minimize(base + relative) == urljoin(base, relative)
> 
> test?
> 
I should hope that *is* the issue, and I should further hope that the 
general wish would be for it to pass that test. Of course web systems 
have been riddled with canonicalization errors in the past, so it'd be 
best if you and/or Andrew could provide a minimize() implementation :-)

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden


From rasky at develer.com  Mon Nov  6 21:01:19 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Mon, 6 Nov 2006 21:01:19 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
References: <454CB619.7010804@v.loewis.de><4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com>
	<454F790D.6090508@v.loewis.de>
Message-ID: <071a01c701de$533a0fb0$c003030a@trilan>

Martin v. L?wis wrote:

>> Why not only import *.pyc files and no longer use *.pyo files.
>>
>> It is simpler to have one compiled python file extension.
>> PYC files can contain optimized python byte code and normal byte
>> code.
>
> So what would you do with the -O option of the interpreter?

I just had an idea: we could have only pyc files, and *no* way to identify
whether specific "optimizations" (-O, -OO --only-strip-docstrings, whatever)
were performed on them or not. So, if you regularly run different python
applications with different optimization settings, you'll end up with .pyc
files containing bytecode that was generated with mixed optimization
settings. It doesn't really matter in most cases, after all.

Then, we add a single command line option (eg: "-I") which is: "ignore
*every* .pyc file out there, and regenerate them as needed". So, the few
times that you really care that a certain application is run with a specific
setting, you can use "python -I -OO app.py".

And that's all.
-- 
Giovanni Bajo


From brett at python.org  Mon Nov  6 21:17:58 2006
From: brett at python.org (Brett Cannon)
Date: Mon, 6 Nov 2006 12:17:58 -0800
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <071a01c701de$533a0fb0$c003030a@trilan>
References: <454CB619.7010804@v.loewis.de>
	<4c45c1530611060418j539d3a8erfa9ea63cfe474d3a@mail.gmail.com>
	<454F790D.6090508@v.loewis.de> <071a01c701de$533a0fb0$c003030a@trilan>
Message-ID: <bbaeab100611061217g6f2e4e3q19d8fc87f1d87939@mail.gmail.com>

On 11/6/06, Giovanni Bajo <rasky at develer.com> wrote:
>
> Martin v. L?wis wrote:
>
> >> Why not only import *.pyc files and no longer use *.pyo files.
> >>
> >> It is simpler to have one compiled python file extension.
> >> PYC files can contain optimized python byte code and normal byte
> >> code.
> >
> > So what would you do with the -O option of the interpreter?
>
> I just had an idea: we could have only pyc files, and *no* way to identify
> whether specific "optimizations" (-O, -OO --only-strip-docstrings,
> whatever)
> were performed on them or not. So, if you regularly run different python
> applications with different optimization settings, you'll end up with .pyc
> files containing bytecode that was generated with mixed optimization
> settings. It doesn't really matter in most cases, after all.


I don't know about that.  If you suspected that a failure could be because
of some bytecode optimization you were trying wouldn't you like to be able
to tell easily that fact?

Granted our situation is not as bad as gcc in terms the impact of having to
regenerate a compiled version, but it still would be nice to be able to make
sure that every .pyc file is the same.  We would need to make it easy to
blast out every .pyc file found if we did allow mixing of optimizations (as
you suggest below).

Then, we add a single command line option (eg: "-I") which is: "ignore
> *every* .pyc file out there, and regenerate them as needed". So, the few
> times that you really care that a certain application is run with a
> specific
> setting, you can use "python -I -OO app.py".


That might work.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061106/dc398f8d/attachment.html 

From tomerfiliba at gmail.com  Mon Nov  6 22:55:11 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Mon, 6 Nov 2006 23:55:11 +0200
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
Message-ID: <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>

cool. first time i build the entire interpreter, 'twas fun :)
currently i "retained" support for __members__ and __methods__,
so it doesn't break anything and is compatible with 2.6.

i really hope it will be included in 2.6 as today i'm using ugly hacks
in RPyC to make remote objects appear like local ones.
having __dir__ solves all of my problems.

besides, it makes a lot of sense of define __dir__ for classes that
define __getattr__. i don't think it should be pushed back to py3k.

here's the patch:
http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470

here's a demo:
>>> class foo(object):
...     def __dir__(self):
...             return ["kan", "ga", "roo"]
...
>>> f = foo()
>>> f
<__main__.foo object at 0x00A90C78>
>>> dir()
['__builtins__', '__doc__', '__name__', 'f', 'foo']
>>> dir(f)
['ga', 'kan', 'roo']
>>> dir(foo)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__getattribute__
', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__
', '__repr__', '__setattr__', '__str__', '__weakref__']
>>> class bar(object):
...     __members__ = ["bow", "wow"]
...
>>> b=bar()
>>> dir(b)
['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash_
_', '__init__', '__members__', '__module__', '__new__', '__reduce__', '__reduce_
ex__', '__repr__', '__setattr__', '__str__', '__weakref__', 'bow', 'wow']


-tomer

On 11/6/06, Guido van Rossum <guido at python.org> wrote:
> Sounds like a good plan, though I'm not sure if it's worth doing in
> 2.6 -- I'd be happy with doing this just in 3k.
>
> I'm not sure what you mean by "adding a method slot" -- certainly it's
> possible to define a method __foo__ and call it directly without
> having a special tp_foo in the type object, and I recommend doing it
> that way since the tp_foo slots are just there to make things fast; in
> this case I don't see a need for dir() to be fast.
>
> --Guido
>
> On 11/6/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> > so, if you remember, i suggested adding __dir__ to objects, so as to make
> > dir() customizable, remove the deprecated __methods__ and __members__,
> > and make it symmetrical to other built-in functions.
> >
> > you can see the original post here:
> > http://mail.python.org/pipermail/python-dev/2006-July/067095.html
> > which was generally accepted by the forum:
> > http://mail.python.org/pipermail/python-dev/2006-July/067139.html
> >
> > so i went on, now that i have some spare time, to research the issue.
> > the current dir() works as follows:
> > (*) builtin_dir calls PyObject_Dir to do the trick
> > (*) if the object is NULL (dir with no argument), return the frame's locals
> > (*) if the object is a *module*, we're just using it's __dict__
> > (*) if the object is a *type*, we're using it's __dict__ and __bases__,
> > but not __class__ (so as not to show the metaclass)
> > (*) otherwise, it's a "normal object", so we take it's __dict__, along with
> > __methods__, __members__, and dir(__class__)
> > (*) create a list of keys from the dict, sort, return
> >
> > we'll have to change that if we were to introduce __dir__. my design is:
> > (*) builtin_dir, if called without an argument, returns the frame's locals
> > (*) otherwise, it calls PyObject_Dir(self), which would dispatch self.__dir__()
> > (*) if `self` doesn't have __dir__, default to object.__dir__(self)
> > (*) the default object.__dir__ implementation would do the same as
> > today: collect __dict__, __members__, __methods__, and dir(__class__).
> > by py3k, we'll remove looking into __methods__ and __members__.
> > (*) type objects and module objects would implement __dir__ to their
> > liking (as PyObject_Dir does today)
> > (*) builtin_dir would take care of sorting the list returned by PyObject_Dir
> >
> > so first i'd want you people to react on my design, maybe you'd find
> > flaws whatever. also, should this become a PEP?
> >
> > and last, how do i add a new method slot? does it mean i need to
> > change all type-object definitions throughout the codebase?
> > do i add it to some protocol? or directly to the "object protocol"?
> >
> >
> > -tomer
> > _______________________________________________
> > Python-Dev mailing list
> > Python-Dev at python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
>
>
> --
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>

From ncoghlan at gmail.com  Mon Nov  6 23:57:07 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Tue, 07 Nov 2006 08:57:07 +1000
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>
Message-ID: <454FBDC3.3060100@gmail.com>

tomer filiba wrote:
> cool. first time i build the entire interpreter, 'twas fun :)
> currently i "retained" support for __members__ and __methods__,
> so it doesn't break anything and is compatible with 2.6.
> 
> i really hope it will be included in 2.6 as today i'm using ugly hacks
> in RPyC to make remote objects appear like local ones.
> having __dir__ solves all of my problems.
> 
> besides, it makes a lot of sense of define __dir__ for classes that
> define __getattr__. i don't think it should be pushed back to py3k.
> 
> here's the patch:
> http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470

As I noted on the tracker, PyObject_Dir is a public C API function, so it's 
behaviour needs to be preserved as well as the behaviour of calling dir() from 
Python code.

So the final form of the patch will likely need to include stronger tests for 
that section of the API, as well as updating the documentation in various 
places (the dir and PyObject_Dir documentation, obviously, but also the list 
of magic methods in the language reference).

+1 on targeting 2.6, too.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From greg.ewing at canterbury.ac.nz  Tue Nov  7 00:20:00 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Tue, 07 Nov 2006 12:20:00 +1300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454EDAEA.7050501@v.loewis.de>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
	<454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de>
Message-ID: <454FC320.9050604@canterbury.ac.nz>

Martin v. L?wis wrote:

> A stat call will not only look at the directory entry, but also
> look at the inode. This will require another disk access, as the
> inode is at a different location of the disk.

That should be in favour of the directory-reading
approach, since e.g. to find out which if any of
x.py/x.pyc/x.pyo exists, you only need to look for
the names.

> It depends on the file system you are using. An NTFS directory
> lookup is a B-Tree search; ...

Yes, I know that some file systems are smarter;
MacOS HFS is another one that uses b-trees.

However it still seems to me that looking up a
path in a file system is a much heavier operation
than looking up a Python dict, even if everything
is in memory. You have to parse the path, and look
up each component separately in a different
directory tree or whatever.

The way I envisage it, you would read all the
directories and build a single dictionary mapping
fully-qualified module names to pathnames. Any
given import then takes at most one dict lookup
and one access of a known-to-exist file.

 > For
> a large directory, the cost of reading in the entire directory
> might be higher than the savings gained from not having to
> search it.

Possibly. I guess we'd need some timings to assess
the meaning of "large".

> Also, if we do our own directory caching, the question
> is when to invalidate the cache.

I think I'd be happy with having to do that explicitly.
I expect the vast majority of Python programs don't
need to track changes to the set of importable modules
during execution. The exceptions would be things like
IDEs, and they could do a cache flush before reloading
a module, etc.

--
Greg


From exarkun at divmod.com  Tue Nov  7 00:33:36 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Mon, 6 Nov 2006 18:33:36 -0500
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454FC320.9050604@canterbury.ac.nz>
Message-ID: <20061106233336.20948.1525550504.divmod.quotient.16145@ohm>

On Tue, 07 Nov 2006 12:20:00 +1300, Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
>I think I'd be happy with having to do that explicitly.
>I expect the vast majority of Python programs don't
>need to track changes to the set of importable modules
>during execution. The exceptions would be things like
>IDEs, and they could do a cache flush before reloading
>a module, etc.

Another questionable optimization which changes application-
level semantics.

No, please?

Jean-Paul

From martin at v.loewis.de  Tue Nov  7 00:38:57 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Nov 2006 00:38:57 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454FC320.9050604@canterbury.ac.nz>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
	<454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de>
	<454FC320.9050604@canterbury.ac.nz>
Message-ID: <454FC791.7080106@v.loewis.de>

Greg Ewing schrieb:
> I think I'd be happy with having to do that explicitly.
> I expect the vast majority of Python programs don't
> need to track changes to the set of importable modules
> during execution. The exceptions would be things like
> IDEs, and they could do a cache flush before reloading
> a module, etc.

That would be a change in behavior, of course.

Currently, you can put a file on disk and import it
immediately; that will stop working. I'm pretty sure
that there are a number of applications that rely
on this specific detail of the current implementation
(and not only IDEs).

It still might be worthwhile to make such a change,
but I'd like to see practical advantages demonstrated
first.

Regards,
Martin

From tdelaney at avaya.com  Tue Nov  7 00:53:26 2006
From: tdelaney at avaya.com (Delaney, Timothy (Tim))
Date: Tue, 7 Nov 2006 10:53:26 +1100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
Message-ID: <2773CAC687FD5F4689F526998C7E4E5FF1EB56@au3010avexu1.global.avaya.com>

"Martin v. L?wis" wrote:

> Greg Ewing schrieb:
>> I think I'd be happy with having to do that explicitly.
>> I expect the vast majority of Python programs don't
>> need to track changes to the set of importable modules
>> during execution. The exceptions would be things like
>> IDEs, and they could do a cache flush before reloading
>> a module, etc.
> 
> That would be a change in behavior, of course.
> 
> Currently, you can put a file on disk and import it
> immediately; that will stop working. I'm pretty sure
> that there are a number of applications that rely
> on this specific detail of the current implementation
> (and not only IDEs).

Would it be reasonable to always do a stat() on the directory, reloading if there's been a change? Would this be reliable across platforms?

Tim Delaney

From hg211 at hszk.bme.hu  Tue Nov  7 03:11:31 2006
From: hg211 at hszk.bme.hu (Herman Geza)
Date: Tue, 7 Nov 2006 03:11:31 +0100 (MET)
Subject: [Python-Dev] valgrind
Message-ID: <Pine.GSO.4.62.0611070257460.888@ural2>

Hi!

I've embedded python into my application. Using valgrind I got a lot of 
errors. I understand that "Conditional jump or move depends on 
uninitialised value(s)" errors are completely ok (from 
Misc/README.valgrind). However, I don't understand why "Invalid read"'s 
are legal, like this:

==21737== Invalid read of size 4
==21737==    at 0x408DDDF: PyObject_Free (in /usr/lib/libpython2.4.so.1.0)
==21737==    by 0x4096F67: (within /usr/lib/libpython2.4.so.1.0)
==21737==    by 0x408A5AC: PyCFunction_Call (in 
/usr/lib/libpython2.4.so.1.0)
==21737==    by 0x40C65F8: PyEval_EvalFrame (in 
/usr/lib/libpython2.4.so.1.0)
==21737==  Address 0xC02E010 is 32 bytes inside a block of size 40 free'd
==21737==    at 0x401D139: free (vg_replace_malloc.c:233)
==21737==    by 0x408DE00: PyObject_Free (in /usr/lib/libpython2.4.so.1.0)
==21737==    by 0x407BB4D: (within /usr/lib/libpython2.4.so.1.0)
==21737==    by 0x407A3D6: (within /usr/lib/libpython2.4.so.1.0)

Here python reads from an already-freed memory area, right? (I don't think 
that Misc/README.valgrind answers this question). Or is it a false alarm?

Thanks,
Geza Herman

From martin at v.loewis.de  Tue Nov  7 07:19:24 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Nov 2006 07:19:24 +0100
Subject: [Python-Dev] valgrind
In-Reply-To: <Pine.GSO.4.62.0611070257460.888@ural2>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
Message-ID: <4550256C.1020109@v.loewis.de>

Herman Geza schrieb:
> Here python reads from an already-freed memory area, right?

It looks like it, yes. Of course, it could be a flaw in valgrind, too.
To find out, one would have to understand what the memory block is,
and what part of PyObject_Free accesses it.

Regards,
Martin

From nnorwitz at gmail.com  Tue Nov  7 08:02:22 2006
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Mon, 6 Nov 2006 23:02:22 -0800
Subject: [Python-Dev] valgrind
In-Reply-To: <4550256C.1020109@v.loewis.de>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
Message-ID: <ee2a432c0611062302i25202c9bnde44989967e7cc3b@mail.gmail.com>

On 11/6/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Herman Geza schrieb:
> > Here python reads from an already-freed memory area, right?
>
> It looks like it, yes. Of course, it could be a flaw in valgrind, too.
> To find out, one would have to understand what the memory block is,
> and what part of PyObject_Free accesses it.

I'm a bit confused.  I ran with valgrind ./python -c pass which
returns 23 invalid read problems (some are the same chunk of memory).
This is with 2.5 (more or less).  Valgrind 3.2.1 on amd64.  Every
address ended with 0x5...020.  That seems odd.  I looked through the
valgrind bug reports and didn't see anything.  The first problem
reported was:

Invalid read of size 4
   at 0x44FA06: Py_ADDRESS_IN_RANGE (obmalloc.c:1741)
   by 0x44E225: PyObject_Free (obmalloc.c:920)
   by 0x44EB90: _PyObject_DebugFree (obmalloc.c:1361)
   by 0x444A28: dictresize (dictobject.c:546)
   by 0x444D5B: PyDict_SetItem (dictobject.c:655)
   by 0x462533: PyString_InternInPlace (stringobject.c:4920)
   by 0x448450: PyDict_SetItemString (dictobject.c:2120)
   by 0x4C240A: PyModule_AddObject (modsupport.c:615)
   by 0x428B00: _PyExc_Init (exceptions.c:2117)
   by 0x4C449A: Py_InitializeEx (pythonrun.c:225)
   by 0x4C4827: Py_Initialize (pythonrun.c:315)
   by 0x41270A: Py_Main (main.c:449)
 Address 0x52AE020 is 4,392 bytes inside a block of size 5,544 free'd
   at 0x4A1A828: free (vg_replace_malloc.c:233)
   by 0x5071635: qsort (in /lib/libc-2.3.5.so)
   by 0x474E4B: init_slotdefs (typeobject.c:5368)
   by 0x47522E: add_operators (typeobject.c:5511)
   by 0x46E3A1: PyType_Ready (typeobject.c:3209)
   by 0x46E2D4: PyType_Ready (typeobject.c:3173)
   by 0x44D13E: _Py_ReadyTypes (object.c:1864)
   by 0x4C4362: Py_InitializeEx (pythonrun.c:183)
   by 0x4C4827: Py_Initialize (pythonrun.c:315)
   by 0x41270A: Py_Main (main.c:449)
   by 0x411CD2: main (python.c:23)

Note that the free is inside qsort.  The memory freed under qsort
should definitely not be the bases which we allocated under
PyType_Ready.  I'll file a bug report with valgrind to help determine
if this is a problem in Python or valgrind.
http://bugs.kde.org/show_bug.cgi?id=136989

One other thing that is weird is that the complaint is about 4 bytes
which should not be possible.  All pointers should be 8 bytes AFAIK
since this is amd64.

I also ran this on x86.  There were 32 errors and all of their
addresses were 0x4...010.

n

From tim.peters at gmail.com  Tue Nov  7 08:20:14 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Tue, 7 Nov 2006 02:20:14 -0500
Subject: [Python-Dev] valgrind
In-Reply-To: <4550256C.1020109@v.loewis.de>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
Message-ID: <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>

[Herman Geza]
>> Here python reads from an already-freed memory area, right?

[Martin v. L?wis]
> It looks like it, yes. Of course, it could be a flaw in valgrind, too.
> To find out, one would have to understand what the memory block is,
> and what part of PyObject_Free accesses it.

When PyObject_Free is handed an address it doesn't control, the "arena
base address" it derives from that address may point at anything the
system malloc controls, including uninitialized memory, memory the
system malloc has allocated to something, memory the system malloc has
freed, or internal system malloc bookkeeping bytes.  The
Py_ADDRESS_IN_RANGE macro has no way to know before reading it up.

So figure out which line of code valgrind is complaining about
(doesn't valgrind usually produce that?).  If it's coming from the
expansion of Py_ADDRESS_IN_RANGE, it's not worth more thought.

From guido at python.org  Tue Nov  7 09:00:14 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 7 Nov 2006 00:00:14 -0800
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <454FBDC3.3060100@gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>
	<454FBDC3.3060100@gmail.com>
Message-ID: <ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>

No objection on targetting 2.6 if other developers agree. Seems this
is well under way. good work!

On 11/6/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> tomer filiba wrote:
> > cool. first time i build the entire interpreter, 'twas fun :)
> > currently i "retained" support for __members__ and __methods__,
> > so it doesn't break anything and is compatible with 2.6.
> >
> > i really hope it will be included in 2.6 as today i'm using ugly hacks
> > in RPyC to make remote objects appear like local ones.
> > having __dir__ solves all of my problems.
> >
> > besides, it makes a lot of sense of define __dir__ for classes that
> > define __getattr__. i don't think it should be pushed back to py3k.
> >
> > here's the patch:
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470
>
> As I noted on the tracker, PyObject_Dir is a public C API function, so it's
> behaviour needs to be preserved as well as the behaviour of calling dir() from
> Python code.
>
> So the final form of the patch will likely need to include stronger tests for
> that section of the API, as well as updating the documentation in various
> places (the dir and PyObject_Dir documentation, obviously, but also the list
> of magic methods in the language reference).
>
> +1 on targeting 2.6, too.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From ronaldoussoren at mac.com  Tue Nov  7 09:57:31 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Tue, 7 Nov 2006 09:57:31 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454FC320.9050604@canterbury.ac.nz>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
	<454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de>
	<454FC320.9050604@canterbury.ac.nz>
Message-ID: <85B44252-6E88-434C-A126-745018C66B25@mac.com>


On  7Nov 2006, at 12:20 AM, Greg Ewing wrote:

>
>> Also, if we do our own directory caching, the question
>> is when to invalidate the cache.
>
> I think I'd be happy with having to do that explicitly.
> I expect the vast majority of Python programs don't
> need to track changes to the set of importable modules
> during execution. The exceptions would be things like
> IDEs, and they could do a cache flush before reloading
> a module, etc.

Not only IDE's, also the interactive prompt. It is very convenient  
that you can currently install an additional module when an import  
fails and then try the import again (at the python prompt).

Ronald

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3562 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061107/9dd3183f/attachment.bin 

From anthony at interlink.com.au  Tue Nov  7 14:19:46 2006
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed, 8 Nov 2006 00:19:46 +1100
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
	<454FBDC3.3060100@gmail.com>
	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
Message-ID: <200611080019.47931.anthony@interlink.com.au>

On Tuesday 07 November 2006 19:00, Guido van Rossum wrote:
> No objection on targetting 2.6 if other developers agree. Seems this
> is well under way. good work!

Sounds fine to me! Less magic under the hood is less magic, and that's always 
a good thing. The use case for it seems completely appropriate, too.

Anthony
-- 
Anthony Baxter     <anthony at interlink.com.au>
It's never too late to have a happy childhood.

From kristjan at ccpgames.com  Tue Nov  7 15:05:14 2006
From: kristjan at ccpgames.com (=?iso-8859-1?Q?Kristj=E1n_V=2E_J=F3nsson?=)
Date: Tue, 7 Nov 2006 14:05:14 -0000
Subject: [Python-Dev] valgrind
Message-ID: <129CEF95A523704B9D46959C922A2800047D98E3@nemesis.central.ccp.cc>

You want to disable the obmalloc module when using valgrind, as I have when using Rational Purify.
obmalloc does some evil stuff to recocnize its memory.
You also want to disable it so that you get verification on a per-block level.

Actually, obmalloc could be improved in this aspect.  Similar code that I once wrote
computed the block base address, but than looked in its tables to see if it was actually
a known block before accessing it.  That way you can have blocks that are larger than
the virtual memory block of the process.

K 

> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com at python.org 
> [mailto:python-dev-bounces+kristjan=ccpgames.com at python.org] 
> On Behalf Of Herman Geza
> Sent: 7. n?vember 2006 02:12
> To: python-dev at python.org
> Subject: [Python-Dev] valgrind
> 
> Hi!
> 
> I've embedded python into my application. Using valgrind I 
> got a lot of errors. I understand that "Conditional jump or 
> move depends on uninitialised value(s)" errors are completely 
> ok (from Misc/README.valgrind). However, I don't understand 
> why "Invalid read"'s are legal, like this:
> 
> ==21737== Invalid read of size 4
> ==21737==    at 0x408DDDF: PyObject_Free (in 
> /usr/lib/libpython2.4.so.1.0)
> ==21737==    by 0x4096F67: (within /usr/lib/libpython2.4.so.1.0)
> ==21737==    by 0x408A5AC: PyCFunction_Call (in 
> /usr/lib/libpython2.4.so.1.0)
> ==21737==    by 0x40C65F8: PyEval_EvalFrame (in 
> /usr/lib/libpython2.4.so.1.0)
> ==21737==  Address 0xC02E010 is 32 bytes inside a block of 
> size 40 free'd
> ==21737==    at 0x401D139: free (vg_replace_malloc.c:233)
> ==21737==    by 0x408DE00: PyObject_Free (in 
> /usr/lib/libpython2.4.so.1.0)
> ==21737==    by 0x407BB4D: (within /usr/lib/libpython2.4.so.1.0)
> ==21737==    by 0x407A3D6: (within /usr/lib/libpython2.4.so.1.0)
> 
> Here python reads from an already-freed memory area, right? 
> (I don't think that Misc/README.valgrind answers this 
> question). Or is it a false alarm?
> 
> Thanks,
> Geza Herman
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/kristjan%40c
cpgames.com
> 

From martin at v.loewis.de  Tue Nov  7 15:31:18 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Nov 2006 15:31:18 +0100
Subject: [Python-Dev] valgrind
In-Reply-To: <ee2a432c0611062302i25202c9bnde44989967e7cc3b@mail.gmail.com>
References: <Pine.GSO.4.62.0611070257460.888@ural2>	
	<4550256C.1020109@v.loewis.de>
	<ee2a432c0611062302i25202c9bnde44989967e7cc3b@mail.gmail.com>
Message-ID: <455098B6.3020903@v.loewis.de>

Neal Norwitz schrieb:
>   at 0x44FA06: Py_ADDRESS_IN_RANGE (obmalloc.c:1741)
> 
> Note that the free is inside qsort.  The memory freed under qsort
> should definitely not be the bases which we allocated under
> PyType_Ready.  I'll file a bug report with valgrind to help determine
> if this is a problem in Python or valgrind.
> http://bugs.kde.org/show_bug.cgi?id=136989

As Tim explains, a read from Py_ADDRESS_IN_RANGE is fine, and by design.
If p is the pointer, we do

 pool = ((poolp)((Py_uintptr_t)(p) & ~(Py_uintptr_t)((4 * 1024) - 1)));

i.e. round down p to the start of the page, to obtain "pool". Then we
do

 f (((pool)->arenaindex < maxarenas && (Py_uintptr_t)(p) -
arenas[(pool)->arenaindex].address < (Py_uintptr_t)(256 << 10) &&
arenas[(pool)->arenaindex].address != 0))

i.e. access pool->arenaindex. If this is our own memory, we really find
a valid arena index there. If this is malloc'ed memory, we read garbage
- due to the page size, we are guaranteed to read successfully, still.
To determine whether it's garbage, we look it up in the arenas array.

> One other thing that is weird is that the complaint is about 4 bytes
> which should not be possible.  All pointers should be 8 bytes AFAIK
> since this is amd64.

That's because the arenaindex is unsigned int. We could widen it to
size_t, if we don't, PyMalloc can "only" manage 1 PiB (with an
arena being 256kiB, and 4Gi arena indices being available).

> I also ran this on x86.  There were 32 errors and all of their
> addresses were 0x4...010.

That's because we round down to the beginning of the page.

Regards,
Martin

From hg211 at hszk.bme.hu  Tue Nov  7 15:54:54 2006
From: hg211 at hszk.bme.hu (Herman Geza)
Date: Tue, 7 Nov 2006 15:54:54 +0100 (MET)
Subject: [Python-Dev] valgrind
In-Reply-To: <1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
	<1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>
Message-ID: <Pine.GSO.4.62.0611071533150.28835@ural2>


On Tue, 7 Nov 2006, Tim Peters wrote:

> When PyObject_Free is handed an address it doesn't control, the "arena
> base address" it derives from that address may point at anything the
> system malloc controls, including uninitialized memory, memory the
> system malloc has allocated to something, memory the system malloc has
> freed, or internal system malloc bookkeeping bytes.  The
> Py_ADDRESS_IN_RANGE macro has no way to know before reading it up.
> 
> So figure out which line of code valgrind is complaining about
> (doesn't valgrind usually produce that?).  If it's coming from the
> expansion of Py_ADDRESS_IN_RANGE, it's not worth more thought.
Hmm. I don't think that way. What if free() does other things? For example 
if free(addr) sees that the memory block at addr is the last block then it 
may call brk with a decreased end_data_segment. Or the last block 
in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries 
to read from this freed memory block it gets SIGSEGV. However, I've never 
got SIGSEGV from python. 

I don't really think that reading from an already-freed block is ever 
legal. I asked my original question because I saw that I'm not the only 
one who sees "Illegal reads" from python. Is valgrind wrong in this case?
I just want to be sure that I'll never get SIGSEGV from python.

Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s 
at Py_ADDRESS_IN_RANGE.

Geza Herman

From tomerfiliba at gmail.com  Tue Nov  7 16:41:53 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 7 Nov 2006 17:41:53 +0200
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <454FBDC3.3060100@gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>
	<454FBDC3.3060100@gmail.com>
Message-ID: <1d85506f0611070741u5aeb5507u4277f80fa325821d@mail.gmail.com>

okay, everything's fixed.
i updated the patch and added a small test to:
Lib/test/test_builtins.py::test_dir


-tomer

On 11/7/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> tomer filiba wrote:
> > cool. first time i build the entire interpreter, 'twas fun :)
> > currently i "retained" support for __members__ and __methods__,
> > so it doesn't break anything and is compatible with 2.6.
> >
> > i really hope it will be included in 2.6 as today i'm using ugly hacks
> > in RPyC to make remote objects appear like local ones.
> > having __dir__ solves all of my problems.
> >
> > besides, it makes a lot of sense of define __dir__ for classes that
> > define __getattr__. i don't think it should be pushed back to py3k.
> >
> > here's the patch:
> > http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470
>
> As I noted on the tracker, PyObject_Dir is a public C API function, so it's
> behaviour needs to be preserved as well as the behaviour of calling dir() from
> Python code.
>
> So the final form of the patch will likely need to include stronger tests for
> that section of the API, as well as updating the documentation in various
> places (the dir and PyObject_Dir documentation, obviously, but also the list
> of magic methods in the language reference).
>
> +1 on targeting 2.6, too.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> ---------------------------------------------------------------
>              http://www.boredomandlaziness.org
>

From tomerfiliba at gmail.com  Tue Nov  7 16:43:45 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Tue, 7 Nov 2006 17:43:45 +0200
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <1d85506f0611070741u5aeb5507u4277f80fa325821d@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>
	<454FBDC3.3060100@gmail.com>
	<1d85506f0611070741u5aeb5507u4277f80fa325821d@mail.gmail.com>
Message-ID: <1d85506f0611070743j2d400f14hcb3d7802172f93d@mail.gmail.com>

> as well as updating the documentation in various
> places (the dir and PyObject_Dir documentation, obviously, but also the list
> of magic methods in the language reference).

oops, i meant everything except that

-tomer

On 11/7/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> okay, everything's fixed.
> i updated the patch and added a small test to:
> Lib/test/test_builtins.py::test_dir
>
>
> -tomer
>
> On 11/7/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> > tomer filiba wrote:
> > > cool. first time i build the entire interpreter, 'twas fun :)
> > > currently i "retained" support for __members__ and __methods__,
> > > so it doesn't break anything and is compatible with 2.6.
> > >
> > > i really hope it will be included in 2.6 as today i'm using ugly hacks
> > > in RPyC to make remote objects appear like local ones.
> > > having __dir__ solves all of my problems.
> > >
> > > besides, it makes a lot of sense of define __dir__ for classes that
> > > define __getattr__. i don't think it should be pushed back to py3k.
> > >
> > > here's the patch:
> > > http://sourceforge.net/tracker/index.php?func=detail&aid=1591665&group_id=5470&atid=305470
> >
> > As I noted on the tracker, PyObject_Dir is a public C API function, so it's
> > behaviour needs to be preserved as well as the behaviour of calling dir() from
> > Python code.
> >
> > So the final form of the patch will likely need to include stronger tests for
> > that section of the API, as well as updating the documentation in various
> > places (the dir and PyObject_Dir documentation, obviously, but also the list
> > of magic methods in the language reference).
> >
> > +1 on targeting 2.6, too.
> >
> > Cheers,
> > Nick.
> >
> > --
> > Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
> > ---------------------------------------------------------------
> >              http://www.boredomandlaziness.org
> >
>

From ronaldoussoren at mac.com  Tue Nov  7 16:45:37 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Tue, 7 Nov 2006 16:45:37 +0100
Subject: [Python-Dev] Inconvenient filename in sandbox/decimal-c/new_dt
Message-ID: <74DDCAB6-7DB0-4943-AB0D-532F3E36FBE6@mac.com>

Hi,

I'm having problems with updating the sandbox.

ilithien:~/Python/sandbox-trunk ronald$ svn cleanup
ilithien:~/Python/sandbox-trunk ronald$ svn up
A    import_in_py/mock_importer.py
U    import_in_py/test_importer.py
U    import_in_py/importer.py
svn: Failed to add file 'decimal-c/new_dt/rounding.decTest': object  
of the same name already exists

This is on a 10.4.8 box with a recent version of subversion. It turns  
out this is caused by a testcase file: decimal-c/new_dt contains both  
remainderNear.decTest and remaindernear.decTest (the filenames differ  
by case only). It this intentional? This makes it impossible to do a  
checkout on a system with a case insensitive filesystem.

Ronald
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3562 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061107/5ab7d1c8/attachment.bin 

From martin at v.loewis.de  Tue Nov  7 17:33:44 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Nov 2006 17:33:44 +0100
Subject: [Python-Dev] valgrind
In-Reply-To: <Pine.GSO.4.62.0611071533150.28835@ural2>
References: <Pine.GSO.4.62.0611070257460.888@ural2>	<4550256C.1020109@v.loewis.de>	<1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>
	<Pine.GSO.4.62.0611071533150.28835@ural2>
Message-ID: <4550B568.6000805@v.loewis.de>

Herman Geza schrieb:
>> So figure out which line of code valgrind is complaining about
>> (doesn't valgrind usually produce that?).  If it's coming from the
>> expansion of Py_ADDRESS_IN_RANGE, it's not worth more thought.
>
> Hmm. I don't think that way. What if free() does other things?

It can't, as the hardware won't support it.

> For example 
> if free(addr) sees that the memory block at addr is the last block then it 
> may call brk with a decreased end_data_segment.

It can't. In brk, you can only manage memory in chunks of "one page"
(i.e. 4kiB on x86). Since we only access memory on the same page,
access is guaranteed to succeed.

> Or the last block 
> in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries 
> to read from this freed memory block it gets SIGSEGV. However, I've never 
> got SIGSEGV from python. 

Likewise. This is guaranteed to work, by the processor manufacturers.

> I don't really think that reading from an already-freed block is ever 
> legal. 

Define "legal". There is no law against it; you don't go to jail for
doing it. What other penalties would you expect (other than valgrind
spitting out error messages, and users complaining from time to time
that it's "illegal")?

> I asked my original question because I saw that I'm not the only 
> one who sees "Illegal reads" from python. Is valgrind wrong in this case?

If it is this case, then no, valgrind is right. Notice that valgrind
doesn't call them "illegal"; it calls them "invalid".

> I just want to be sure that I'll never get SIGSEGV from python.

You least won't get SIGSEGVs from that part of the code.

> Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s 
> at Py_ADDRESS_IN_RANGE.

Right. This is to tell valgrind that these reads are known to work
as designed.

Regards,
Martin

From martin at v.loewis.de  Tue Nov  7 17:42:23 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Nov 2006 17:42:23 +0100
Subject: [Python-Dev] Inconvenient filename in sandbox/decimal-c/new_dt
In-Reply-To: <74DDCAB6-7DB0-4943-AB0D-532F3E36FBE6@mac.com>
References: <74DDCAB6-7DB0-4943-AB0D-532F3E36FBE6@mac.com>
Message-ID: <4550B76F.1090800@v.loewis.de>

Ronald Oussoren schrieb:
> This is on a 10.4.8 box with a recent version of subversion. It turns
> out this is caused by a testcase file: decimal-c/new_dt contains both
> remainderNear.decTest and remaindernear.decTest (the filenames differ by
> case only). It this intentional?

I don't think so. The files differed only in the version: field, and
remainderNear.decTest is the same as the Python trunk, so I removed
remaindernear.decTest as bogus.

Regards,
Martin

From hg211 at hszk.bme.hu  Tue Nov  7 18:09:46 2006
From: hg211 at hszk.bme.hu (Herman Geza)
Date: Tue, 7 Nov 2006 18:09:46 +0100 (MET)
Subject: [Python-Dev] valgrind
In-Reply-To: <4550B568.6000805@v.loewis.de>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
	<1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>
	<Pine.GSO.4.62.0611071533150.28835@ural2>
	<4550B568.6000805@v.loewis.de>
Message-ID: <Pine.GSO.4.62.0611071750280.13401@ural2>


> > For example 
> > if free(addr) sees that the memory block at addr is the last block then it 
> > may call brk with a decreased end_data_segment.
> 
> It can't. In brk, you can only manage memory in chunks of "one page"
> (i.e. 4kiB on x86). Since we only access memory on the same page,
> access is guaranteed to succeed.
Yes, I'm aware of it. But logically, it is possible, isn't it?
At malloc(), libc recognizes that brk needed, it calls sbrk(4096).
Suppose that python releases this very same block immediately. At free(), 
libc recognizes that sbrk(-4096) could be executed, so the freed block not 
available anymore (even for reading)

> > Or the last block 
> > in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries 
> > to read from this freed memory block it gets SIGSEGV. However, I've never 
> > got SIGSEGV from python. 
> 
> Likewise. This is guaranteed to work, by the processor manufacturers.
The same: if the freed block is the last one in the mmap'd area, libc may 
unmap it, doesn't it?

> > I don't really think that reading from an already-freed block is ever 
> > legal. 
> 
> Define "legal". There is no law against it; you don't go to jail for
> doing it. What other penalties would you expect (other than valgrind
> spitting out error messages, and users complaining from time to time
> that it's "illegal")?
Ok, sorry about the strong word "legal".

> > I asked my original question because I saw that I'm not the only 
> > one who sees "Illegal reads" from python. Is valgrind wrong in this case?
> 
> If it is this case, then no, valgrind is right. Notice that valgrind
> doesn't call them "illegal"; it calls them "invalid".
> 
> > I just want to be sure that I'll never get SIGSEGV from python.
> 
> You least won't get SIGSEGVs from that part of the code.
That's what I still don't understand. If valgrind is right then how can 
python be sure that it can still reach a freed block?

> > Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s 
> > at Py_ADDRESS_IN_RANGE.
> 
> Right. This is to tell valgrind that these reads are known to work
> as designed.
Does this mean that python strongly depends on libc? If I want to port 
python to another platform which uses a totally different malloc, is 
Py_ADDRESS_IN_RANGE guaranteed to work or do I have to make some changes?
(actually I'm porting python to another platfrom that's why I'm asking 
these questions, not becaue I'm finical or something)

Thanks,
Geza Herman

From martin at v.loewis.de  Tue Nov  7 18:50:04 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 07 Nov 2006 18:50:04 +0100
Subject: [Python-Dev] valgrind
In-Reply-To: <Pine.GSO.4.62.0611071750280.13401@ural2>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
	<1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>
	<Pine.GSO.4.62.0611071533150.28835@ural2>
	<4550B568.6000805@v.loewis.de>
	<Pine.GSO.4.62.0611071750280.13401@ural2>
Message-ID: <4550C74C.1060402@v.loewis.de>

Herman Geza schrieb:
>> It can't. In brk, you can only manage memory in chunks of "one page"
>> (i.e. 4kiB on x86). Since we only access memory on the same page,
>> access is guaranteed to succeed.
> Yes, I'm aware of it. But logically, it is possible, isn't it?

No, it isn't.

> At malloc(), libc recognizes that brk needed, it calls sbrk(4096).
> Suppose that python releases this very same block immediately. At free(), 
> libc recognizes that sbrk(-4096) could be executed, so the freed block not 
> available anymore (even for reading)

That can't happen for a different reason. When this access occurs,
we still have a pointer to allocated memory (either allocated through
malloc, or obmalloc - we don't know at the pointer where the access
is made). The access is "invalid" only if the memory was allocated
through malloc. So when the access is made, we have a pointer p,
which is allocated through malloc, and access p-3000 (say, assuming
that p-3000 is a page boundary). Since p is still allocated, libc
*cannot* have made sbrk(p-3000), since that would have released
the still-allocated block.

> 
>>> Or the last block 
>>> in an mmap'd area - it calls unmap. So when Py_ADDRESS_IN_RANGE tries 
>>> to read from this freed memory block it gets SIGSEGV. However, I've never 
>>> got SIGSEGV from python. 
>> Likewise. This is guaranteed to work, by the processor manufacturers.
> The same: if the freed block is the last one in the mmap'd area, libc may 
> unmap it, doesn't it?

But it isn't. We still have an allocated block of memory on the
same page. The C library can't have released it.

>>> I just want to be sure that I'll never get SIGSEGV from python.
>> You least won't get SIGSEGVs from that part of the code.
> That's what I still don't understand. If valgrind is right then how can 
> python be sure that it can still reach a freed block?

valgrind knows the block is released. We know that the block is still
"mapped" to memory by the operating system. These are different
properties. To write to memory, you better have allocated it. To read
from memory, it ought to be mapped (in most applications, it is also
an error to read from released memory, even if the read operation
succeeds; valgrind reports this error as "invalid read").

>>> Note that Misc/valgrind-python.supp contains suppressions "Invalid read"'s 
>>> at Py_ADDRESS_IN_RANGE.
>> Right. This is to tell valgrind that these reads are known to work
>> as designed.
> Does this mean that python strongly depends on libc?

No. It strongly depends on a lower estimate of the page size, and that
memory is mapped on page boundaries.

> If I want to port 
> python to another platform which uses a totally different malloc, is 
> Py_ADDRESS_IN_RANGE guaranteed to work or do I have to make some changes?

It's rather unimportant how malloc is implemented. The real question
is whether you have a flat address space (Python likely won't work at
all if you don't have a flat address space), and whether the system
either doesn't have virtual memory, or, if it does, whether obmalloc's
guess of the page size is either right or an underestimation.
If some constraints fail, you can't use obmalloc (you could still
port Python, to not use obmalloc).

Notice that on a system with limited memory, you probably don't
want to use obmalloc, even if it worked. obmalloc uses arenas
of 256kiB, which might be expensive on the target system.

Out of curiosity: what is your target system?

Regards,
Martin

From hg211 at hszk.bme.hu  Tue Nov  7 19:56:31 2006
From: hg211 at hszk.bme.hu (Herman Geza)
Date: Tue, 7 Nov 2006 19:56:31 +0100 (MET)
Subject: [Python-Dev] valgrind
In-Reply-To: <4550C74C.1060402@v.loewis.de>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
	<1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>
	<Pine.GSO.4.62.0611071533150.28835@ural2>
	<4550B568.6000805@v.loewis.de>
	<Pine.GSO.4.62.0611071750280.13401@ural2>
	<4550C74C.1060402@v.loewis.de>
Message-ID: <Pine.GSO.4.62.0611071937180.3991@ural2>

Thanks Martin, now everything is clear. Python always reads from the page 
where the about-to-be-freed block is located (that was the information 
that I missed) - as such, never causes a SIGSEGV.

Geza Herman

From tim.peters at gmail.com  Tue Nov  7 21:34:45 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Tue, 7 Nov 2006 15:34:45 -0500
Subject: [Python-Dev] valgrind
In-Reply-To: <129CEF95A523704B9D46959C922A2800047D98E3@nemesis.central.ccp.cc>
References: <129CEF95A523704B9D46959C922A2800047D98E3@nemesis.central.ccp.cc>
Message-ID: <1f7befae0611071234q468cd8ccn46290ebadc26ae16@mail.gmail.com>

[Kristj?n V. J?nsson]
> ...
> Actually, obmalloc could be improved in this aspect.  Similar code that I once wrote
> computed the block base address, but than looked in its tables to see if it was
> actually a known block before accessing it.

Several such schemes were tried (based on, e.g., binary search and
splay trees), but discarded due to measurable sloth.  The overwhelming
advantage of the current scheme is that it does the check in constant
time, independent of how many distinct arenas (whether one or
thousands makes no difference) pymalloc is managing.

> That way you can have blocks that are larger than the virtual memory block
> of the process.

If you have a way to do the check in constant time, that would be
good.  Otherwise speed rules here.

From tim.peters at gmail.com  Tue Nov  7 22:01:01 2006
From: tim.peters at gmail.com (Tim Peters)
Date: Tue, 7 Nov 2006 16:01:01 -0500
Subject: [Python-Dev] valgrind
In-Reply-To: <4550C74C.1060402@v.loewis.de>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
	<1f7befae0611062320l369e7576l1f6d3eabd5cc07fe@mail.gmail.com>
	<Pine.GSO.4.62.0611071533150.28835@ural2>
	<4550B568.6000805@v.loewis.de>
	<Pine.GSO.4.62.0611071750280.13401@ural2>
	<4550C74C.1060402@v.loewis.de>
Message-ID: <1f7befae0611071301oaf34eebma466c7128bcf3e5a@mail.gmail.com>

[Martin v. L?wis]

Thanks for explaining all this!  One counterpoint:

> Notice that on a system with limited memory, you probably don't
> want to use obmalloc, even if it worked. obmalloc uses arenas
> of 256kiB, which might be expensive on the target system.

OTOH, Python allocates a lot of small objects, and one of the reasons
for obmalloc's existence is that it typically uses memory more
efficiently (less bookkeeping space overhead and less fragmentation)
for mounds of small objects than the all-purpose system malloc.

In a current (trunk) debug build, simply starting Python hits an arena
highwater mark of 9, and doing "python -S" instead hits a highwater
mark of 2.  Given how much memory Python needs to do nothing ;-), it's
doubtful that the system malloc would be doing better.

From grig.gheorghiu at gmail.com  Tue Nov  7 23:33:04 2006
From: grig.gheorghiu at gmail.com (Grig Gheorghiu)
Date: Tue, 7 Nov 2006 14:33:04 -0800
Subject: [Python-Dev] test_ucn fails for trunk on x86 Ubuntu Edgy
Message-ID: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com>

One of the Pybots buildslaves running x86 Ubuntu Edgy has been failing
the unit test step for the trunk, specifically the test_ucn test.
Here's the error:

test_ucn
test test_ucn failed -- Traceback (most recent call last):
  File "/home/pybot/pybot/trunk.bear-x86/build/Lib/test/test_ucn.py",
line 102, in test_bmp_characters
    self.assertEqual(unicodedata.lookup(name), char)
KeyError: "undefined character name 'EIGHT PETALLED OUTLINED BLACK FLORETTE'"

Here's the entire log for the failed step:

http://www.python.org/dev/buildbot/community/all/x86%20Ubuntu%20Edgy%20trunk/builds/142/step-test/0

Note that this test passes on all the other plaforms running in the
Pybots farm, including an amd64 Ubuntu Edgy machine.

Looks like the failure started to happen after this checkin:

http://svn.python.org/view?rev=52621&view=rev

Grig

-- 
http://agiletesting.blogspot.com

From greg.ewing at canterbury.ac.nz  Wed Nov  8 02:38:02 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Nov 2006 14:38:02 +1300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <20061106135751.GA29592@code0.codespeak.net>
References: <454CB619.7010804@v.loewis.de>
	<20061106135751.GA29592@code0.codespeak.net>
Message-ID: <455134FA.9000001@canterbury.ac.nz>

Armin Rigo wrote:

It would seem good practice to remove all .pycs
after checking out a new version of the source,
just in case there are other problems such as
mismatched timestamps, which can cause the same
trouble.

> My two
> cents is that it would be saner to have two separate concepts: cache
> files used internally by the interpreter for speed reasons only, and
> bytecode files that can be shipped and imported.

That's a possibility.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Nov  8 02:38:18 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Nov 2006 14:38:18 +1300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <454FC791.7080106@v.loewis.de>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
	<454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de>
	<454FC320.9050604@canterbury.ac.nz> <454FC791.7080106@v.loewis.de>
Message-ID: <4551350A.60607@canterbury.ac.nz>

Martin v. L?wis wrote:

> Currently, you can put a file on disk and import it
> immediately; that will stop working.

One thing I should add is that if you try to import
a module that wasn't there before, the interpreter will
notice this and has the opportunity to update its idea
of what's on the disk.

Likewise, if you delete a module, the interpreter will
notice when it tries to open a file that no longer exists.

The only change would be if you added a module that
shadowed something formerly visible further along
sys.path -- in between starting the program and
attempting to import it for the first time.

So I don't think there would be any visible change as
far as most people could tell.

--
Greg

From greg.ewing at canterbury.ac.nz  Wed Nov  8 02:38:28 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Wed, 08 Nov 2006 14:38:28 +1300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <2773CAC687FD5F4689F526998C7E4E5FF1EB56@au3010avexu1.global.avaya.com>
References: <2773CAC687FD5F4689F526998C7E4E5FF1EB56@au3010avexu1.global.avaya.com>
Message-ID: <45513514.6090400@canterbury.ac.nz>

Delaney, Timothy (Tim) wrote:

> Would it be reasonable to always do a stat() on the directory,
 > reloading if there's been a change? Would this be reliable across
 > platforms?

To detect a new shadowing you'd have to stat all the
directories along sys.path, not just the one you think
the file is in. That might wipe out most of the
advantage.

It would be different on platforms which provide a
way of "watching" a directory and getting notified of
changes. I think MacOSX, Linux and Windows all provide
some way of doing that nowadays, although I'm not
familiar with the details.

--
Greg

From python-dev at zesty.ca  Wed Nov  8 03:20:52 2006
From: python-dev at zesty.ca (Ka-Ping Yee)
Date: Tue, 7 Nov 2006 20:20:52 -0600 (CST)
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <20061106135751.GA29592@code0.codespeak.net>
References: <454CB619.7010804@v.loewis.de>
	<20061106135751.GA29592@code0.codespeak.net>
Message-ID: <Pine.LNX.4.58.0611072019120.6331@server1.LFW.org>

On Mon, 6 Nov 2006, Armin Rigo wrote:
> I know it's a discussion that comes up and dies out regularly.  My two
> cents is that it would be saner to have two separate concepts: cache
> files used internally by the interpreter for speed reasons only, and
> bytecode files that can be shipped and imported.

I like this approach.  Bringing source code and program behaviour closer
together makes debugging easier, and if someone wants to run Python
programs without source code, then EIBTI.


-- ?!ng

From kbk at shore.net  Wed Nov  8 05:31:22 2006
From: kbk at shore.net (Kurt B. Kaiser)
Date: Tue, 7 Nov 2006 23:31:22 -0500 (EST)
Subject: [Python-Dev] Weekly Python Patch/Bug Summary
Message-ID: <200611080431.kA84VM59025651@bayview.thirdcreek.com>

Patch / Bug Summary
___________________

Patches :  430 open ( -4) /  3447 closed (+17) /  3877 total (+13)
Bugs    :  922 open ( -7) /  6316 closed (+31) /  7238 total (+24)
RFE     :  245 open ( +0) /   241 closed ( +1) /   486 total ( +1)

New / Reopened Patches
______________________

modulefinder changes for py3k  (2006-10-27)
CLOSED http://python.org/sf/1585966  opened by  Thomas Heller

no wraparound for enumerate()  (2006-10-28)
CLOSED http://python.org/sf/1586315  opened by  Georg Brandl

missing imports ctypes in documentation examples  (2006-09-13)
CLOSED http://python.org/sf/1557890  reopened by  theller

better error msgs for some TypeErrors  (2006-10-29)
       http://python.org/sf/1586791  opened by  Georg Brandl

cookielib: lock acquire/release try..finally protected  (2006-10-30)
       http://python.org/sf/1587139  opened by  kxroberto

Patch for #1586414 to avoid fragmentation on Windows  (2006-10-31)
       http://python.org/sf/1587674  opened by  Enoch Julias

Typo in Mac installer image name  (2006-11-01)
CLOSED http://python.org/sf/1589013  opened by  Humberto Di?genes

Typo in Mac image name  (2006-11-01)
CLOSED http://python.org/sf/1589014  opened by  Humberto Di?genes

MacPython Build Installer - Typos and Style corrections  (2006-11-02)
CLOSED http://python.org/sf/1589070  opened by  Humberto Di?genes

bdist_sunpkg distutils command  (2006-11-02)
       http://python.org/sf/1589266  opened by  Holger

The "lazy strings" patch  (2006-11-04)
       http://python.org/sf/1590352  opened by  Larry Hastings

adding __dir__  (2006-11-06)
       http://python.org/sf/1591665  opened by  ganges master

`in` for classic object causes segfault  (2006-11-07)
       http://python.org/sf/1591996  opened by  Hirokazu Yamamoto

PyErr_CheckSignals returns -1 on error, not 1  (2006-11-07)
       http://python.org/sf/1592072  opened by  Gustavo J. A. M. Carneiro

Add missing elide argument to Text.search  (2006-11-07)
       http://python.org/sf/1592250  opened by  Russell Owen

Patches Closed
______________

Fix for structmember conversion issues  (2006-08-30)
       http://python.org/sf/1549049  closed by  loewis

Enable SSL for smtplib  (2006-09-28)
       http://python.org/sf/1567274  closed by  loewis

Mailbox will not lock properly after flush()  (2006-10-11)
       http://python.org/sf/1575506  closed by  akuchling

urllib2 - Fix line breaks in authorization headers  (2006-10-09)
       http://python.org/sf/1574068  closed by  akuchling

Tiny patch to stop make spam  (2006-06-09)
       http://python.org/sf/1503717  closed by  akuchling

modulefinder changes for py3k  (2006-10-27)
       http://python.org/sf/1585966  closed by  gvanrossum

unparse.py decorator support  (2006-09-04)
       http://python.org/sf/1552024  closed by  gbrandl

no wraparound for enumerate()  (2006-10-28)
       http://python.org/sf/1586315  closed by  rhettinger

missing imports ctypes in documentation examples  (2006-09-13)
       http://python.org/sf/1557890  closed by  theller

missing imports ctypes in documentation examples  (2006-09-13)
       http://python.org/sf/1557890  closed by  nnorwitz

tarfile.py: better use of TarInfo objects with longnames  (2006-10-24)
       http://python.org/sf/1583880  closed by  gbrandl

tarfile depends on undocumented behaviour  (2006-09-25)
       http://python.org/sf/1564981  closed by  gbrandl

Typo in Mac installer image name  (2006-11-02)
       http://python.org/sf/1589013  closed by  ronaldoussoren

Typo in Mac image name  (2006-11-01)
       http://python.org/sf/1589014  deleted by  virtualspirit

MacPython Build Installer - Typos and Style corrections  (2006-11-02)
       http://python.org/sf/1589070  closed by  ronaldoussoren

bdist_rpm not able to compile multiple rpm packages  (2004-11-04)
       http://python.org/sf/1060577  closed by  loewis

Remove inconsistent behavior between import and zipimport  (2005-11-03)
       http://python.org/sf/1346572  closed by  loewis

Rational Reference Implementation  (2002-10-02)
       http://python.org/sf/617779  closed by  loewis

Problem at the end of misformed mailbox  (2002-11-03)
       http://python.org/sf/632934  closed by  loewis

New / Reopened Bugs
___________________

csv.reader.line_num missing 'new in 2.5'  (2006-10-27)
CLOSED http://python.org/sf/1585690  opened by  Kent Johnson

tarfile.extract() may cause file fragmentation on Windows XP  (2006-10-28)
       http://python.org/sf/1586414  opened by  Enoch Julias

compiler module dont emit LIST_APPEND  w/ list comprehension  (2006-10-29)
CLOSED http://python.org/sf/1586448  opened by  sebastien Martini

codecs.open problem with "with" statement  (2006-10-28)
CLOSED http://python.org/sf/1586513  opened by  Shaun Cutts

zlib/bz2_codec doesn't support incremental decoding  (2006-10-29)
CLOSED http://python.org/sf/1586613  opened by  Topia

hashlib documentation is insuficient  (2006-10-29)
CLOSED http://python.org/sf/1586773  opened by  Marcos Daniel Marado Torres

scipy gammaincinv gives incorrect answers  (2006-10-31)
CLOSED http://python.org/sf/1587679  opened by  David J.C. MacKay

quoted printable parse the sequence '= ' incorrectly  (2006-10-31)
       http://python.org/sf/1588217  opened by  Wai Yip Tung

string subscripting not working on a specific string  (2006-11-02)
CLOSED http://python.org/sf/1588975  opened by  Dan Aronson

Unneeded constants left during optimization  (2006-11-02)
CLOSED http://python.org/sf/1589074  opened by  Daniel

ctypes XXX - add a crossref, at least  (2006-11-02)
CLOSED http://python.org/sf/1589328  opened by  Jim Jewett

urllib2 does local import of tokenize.py  (2006-11-02)
       http://python.org/sf/1589480  reopened by  drfarina

urllib2 does local import of tokenize.py  (2006-11-02)
       http://python.org/sf/1589480  opened by  Daniel Farina

__getattr__ = getattr crash  (2006-11-03)
CLOSED http://python.org/sf/1590036  opened by  Brian Harring

Error piping output between scripts on Windows  (2006-11-03)
       http://python.org/sf/1590068  opened by  Andrei

where is zlib???  (2006-11-04)
       http://python.org/sf/1590592  opened by  AKap

mail message parsing glitch  (2006-11-05)
       http://python.org/sf/1590744  opened by  Mike

python: Python/ast.c:541: seq_for_testlist: Assertion fails  (2006-10-31)
CLOSED http://python.org/sf/1588287  opened by  Tom Epperly

python: Python/ast.c:541: seq_for_testlist: Assertion  (2006-11-05)
CLOSED http://python.org/sf/1590804  opened by  Jay T Miller

subprocess deadlock  (2006-11-05)
       http://python.org/sf/1590864  opened by  Michael Tsai

random.randrange don't return correct value for big number  (2006-11-06)
       http://python.org/sf/1590891  opened by  MATSUI Tetsushi

update urlparse to RFC 3986  (2006-11-05)
       http://python.org/sf/1591035  opened by  Andrew Dalke

problem building python in vs8express  (2006-11-05)
       http://python.org/sf/1591122  reopened by  thomashsouthern

problem building python in vs8express  (2006-11-05)
       http://python.org/sf/1591122  opened by  Thomas Southern

replace groups doesn't work in this special case  (2006-11-06)
CLOSED http://python.org/sf/1591319  opened by  Thomas K.

Urllib2.urlopen() raises OSError w/bad HTTP Location header  (2006-11-07)
       http://python.org/sf/1591774  opened by  nikitathespider

Undocumented implicit strip() in split(None) string method  (2005-01-19)
       http://python.org/sf/1105286  reopened by  yohell

Stepping into a generator throw does not work  (2006-11-07)
       http://python.org/sf/1592241  opened by  Bernhard Mulder

Bugs Closed
___________

python_d python  (2006-09-21)
       http://python.org/sf/1563243  closed by  sf-robot

glob.glob("c:\\[ ]\*) doesn't work  (2006-10-19)
       http://python.org/sf/1580472  closed by  gbrandl

structmember T_LONG won't accept a python long  (2006-08-24)
       http://python.org/sf/1545696  closed by  loewis

T_ULONG -> double rounding in PyMember_GetOne()  (2006-09-27)
       http://python.org/sf/1566140  closed by  loewis

Different behavior when stepping through code w/ pdb  (2006-10-24)
       http://python.org/sf/1583276  closed by  jpe

csv.reader.line_num missing 'new in 2.5'  (2006-10-27)
       http://python.org/sf/1585690  closed by  akuchling

asyncore.dispatcher.set_reuse_addr not documented.  (2006-09-20)
       http://python.org/sf/1562583  closed by  akuchling

inconsistency in PCALL conditional code in ceval.c  (2006-08-17)
       http://python.org/sf/1542016  closed by  akuchling

functools.wraps fails on builtins  (2006-10-12)
       http://python.org/sf/1576241  closed by  akuchling

str(WindowsError) wrong  (2006-10-12)
       http://python.org/sf/1576174  closed by  theller

does not raise SystemError on too many nested blocks  (2006-09-25)
       http://python.org/sf/1565514  closed by  nnorwitz

curses module segfaults on invalid tparm arguments  (2006-08-28)
       http://python.org/sf/1548092  closed by  nnorwitz

"from __future__ import foobar;" causes wrong SyntaxError  (2006-08-19)
       http://python.org/sf/1543306  closed by  nnorwitz

compiler module dont emit LIST_APPEND  w/ list comprehension  (2006-10-28)
       http://python.org/sf/1586448  closed by  gbrandl

distutils adds (unwanted) -xcode=pic32 in the compile comman  (2006-05-19)
       http://python.org/sf/1491574  closed by  nnorwitz

codecs.open problem with "with" statement  (2006-10-29)
       http://python.org/sf/1586513  closed by  gbrandl

suprocess cannot handle shell arguments  (2005-11-16)
       http://python.org/sf/1357915  closed by  gbrandl

zlib/bz2_codec doesn't support incremental decoding  (2006-10-29)
       http://python.org/sf/1586613  closed by  gbrandl

missing __enter__ + __getattr__ forwarding  (2006-10-20)
       http://python.org/sf/1581357  closed by  gbrandl

hashlib documentation is insuficient  (2006-10-29)
       http://python.org/sf/1586773  closed by  gbrandl

scipy gammaincinv gives incorrect answers  (2006-10-31)
       http://python.org/sf/1587679  closed by  loewis

string subscripting not working on a specific string  (2006-11-02)
       http://python.org/sf/1588975  closed by  gbrandl

ctypes XXX - add a crossref, at least  (2006-11-02)
       http://python.org/sf/1589328  closed by  theller

dict keyerror formatting and tuples  (2006-10-13)
       http://python.org/sf/1576657  closed by  gbrandl

__getattr__ = getattr crash  (2006-11-03)
       http://python.org/sf/1590036  closed by  arigo

potential buffer overflow in complexobject.c  (2006-10-13)
       http://python.org/sf/1576861  closed by  sf-robot

inspect.py imports local "tokenize.py" file  (2006-11-02)
       http://python.org/sf/1589480  closed by  loewis

python: Python/ast.c:541: seq_for_testlist: Assertion fails  (2006-10-31)
       http://python.org/sf/1588287  closed by  nnorwitz

python: Python/ast.c:541: seq_for_testlist: Assertion  (2006-11-05)
       http://python.org/sf/1590804  closed by  loewis

TypeError message on bad iteration is misleading  (2005-04-21)
       http://python.org/sf/1187437  closed by  gbrandl

replace groups doesn't work in this special case  (2006-11-06)
       http://python.org/sf/1591319  closed by  niemeyer

unchecked metaclass mro   (2006-09-28)
       http://python.org/sf/1567234  closed by  akuchling

curses getkey() crash in raw mode  (2004-02-09)
       http://python.org/sf/893250  closed by  akuchling

RFE Closed
__________

Unneeded constants left during optimization  (2006-11-02)
       http://python.org/sf/1589074  closed by  loewis


From martin at v.loewis.de  Wed Nov  8 06:18:43 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Nov 2006 06:18:43 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <4551350A.60607@canterbury.ac.nz>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
	<454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de>
	<454FC320.9050604@canterbury.ac.nz> <454FC791.7080106@v.loewis.de>
	<4551350A.60607@canterbury.ac.nz>
Message-ID: <455168B3.3040809@v.loewis.de>

Greg Ewing schrieb:
> One thing I should add is that if you try to import
> a module that wasn't there before, the interpreter will
> notice this and has the opportunity to update its idea
> of what's on the disk.

How will it notice that it wasn't there before? The
interpreter will see that it hasn't imported the module;
it can't know whether it was there before while trying
to resolve the import: when looking at a directory in
sys.path, it needs to decide whether to use the directory
cache or not. If the directory is not in the cache, it
might be one of three things:
a) the directory cache is out of date, and you should
   re-read the directory
b) the module still isn't there, but is available in
   a later directory on sys.path (which hasn't yet
   been visited)
c) the module isn't there at all, and the import will
   eventually fail.

How can the interpreter determine which of these it
is?

Regards,
Martin

From martin at v.loewis.de  Wed Nov  8 06:54:26 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 08 Nov 2006 06:54:26 +0100
Subject: [Python-Dev] test_ucn fails for trunk on x86 Ubuntu Edgy
In-Reply-To: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com>
References: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com>
Message-ID: <45517112.3060509@v.loewis.de>

Grig Gheorghiu schrieb:
> One of the Pybots buildslaves running x86 Ubuntu Edgy has been failing
> the unit test step for the trunk, specifically the test_ucn test.

Something is wrong with the machine. I forced a clean rebuild, and
now it crashes in test_doctest2:

http://www.python.org/dev/buildbot/community/all/x86%20Ubuntu%20Edgy%20trunk/builds/145/step-test/0

So either the compiler or some library has been updated in a strange
way, or there is a hardware problem. One would need access to the
machine to find out (and analyzing it is likely time-consuming).

Regards,
Martin

From nnorwitz at gmail.com  Wed Nov  8 07:04:10 2006
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Tue, 7 Nov 2006 22:04:10 -0800
Subject: [Python-Dev] valgrind
In-Reply-To: <455098B6.3020903@v.loewis.de>
References: <Pine.GSO.4.62.0611070257460.888@ural2>
	<4550256C.1020109@v.loewis.de>
	<ee2a432c0611062302i25202c9bnde44989967e7cc3b@mail.gmail.com>
	<455098B6.3020903@v.loewis.de>
Message-ID: <ee2a432c0611072204y62d81fa9rbd1118ef4012d8@mail.gmail.com>

On 11/7/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Neal Norwitz schrieb:
> >   at 0x44FA06: Py_ADDRESS_IN_RANGE (obmalloc.c:1741)
> >
> > Note that the free is inside qsort.  The memory freed under qsort
> > should definitely not be the bases which we allocated under
> > PyType_Ready.  I'll file a bug report with valgrind to help determine
> > if this is a problem in Python or valgrind.
> > http://bugs.kde.org/show_bug.cgi?id=136989
>
> As Tim explains, a read from Py_ADDRESS_IN_RANGE is fine, and by design.
> If p is the pointer, we do

Yeah, thanks for going over it again.  I was tired and only half
paying attention last night.  Tonight isn't going much better. :-(  I
wonder if we can capture any of these exchanges and put into
README.valgrind.  I'm not about to do it tonight though.

n

From mwh at python.net  Wed Nov  8 16:07:41 2006
From: mwh at python.net (Michael Hudson)
Date: Wed, 08 Nov 2006 16:07:41 +0100
Subject: [Python-Dev] Last chance to join the Summer of PyPy!
Message-ID: <87fycurn7m.fsf@starship.python.net>

Hopefully by now you have heard of the "Summer of PyPy", our program
for funding the expenses of attending a sprint for students.  If not,
you've just read the essence of the idea :-)

However, the PyPy EU funding period is drawing to an end and there is
now only one sprint left where we can sponsor the travel costs of
interested students within our program. This sprint will probably take
place in Leysin, Switzerland from 8th-14th of January 2007.

So, as explained in more detail at:

    http://codespeak.net/pypy/dist/pypy/doc/summer-of-pypy.html

we would encourage any interested students to submit a proposal in the
next month or so.  If you're stuck for ideas, you can find some at

    http://codespeak.net/pypy/dist/pypy/doc/project-ideas.html
    
but please do not feel limited in any way by this list!

Cheers,
mwh

... and the PyPy team

-- 
  <shawn> the highest calling of technical book writers is to
          destroy the sun                       -- from Twisted.Quotes

From grig.gheorghiu at gmail.com  Wed Nov  8 16:27:44 2006
From: grig.gheorghiu at gmail.com (Grig Gheorghiu)
Date: Wed, 8 Nov 2006 07:27:44 -0800
Subject: [Python-Dev] test_ucn fails for trunk on x86 Ubuntu Edgy
In-Reply-To: <45517112.3060509@v.loewis.de>
References: <3f09d5a00611071433w7b1f28d2gdffc314fb02e6a72@mail.gmail.com>
	<45517112.3060509@v.loewis.de>
Message-ID: <3f09d5a00611080727k4e300871h2af920acd3867b8a@mail.gmail.com>

On 11/7/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Grig Gheorghiu schrieb:
> > One of the Pybots buildslaves running x86 Ubuntu Edgy has been failing
> > the unit test step for the trunk, specifically the test_ucn test.
>
> Something is wrong with the machine. I forced a clean rebuild, and
> now it crashes in test_doctest2:
>
> http://www.python.org/dev/buildbot/community/all/x86%20Ubuntu%20Edgy%20trunk/builds/145/step-test/0
>
> So either the compiler or some library has been updated in a strange
> way, or there is a hardware problem. One would need access to the
> machine to find out (and analyzing it is likely time-consuming).
>
> Regards,
> Martin
>

Thanks for looking into it. I'll contact the owner of that machine and
we'll try to figure out what's going on.

Grig

From greg.ewing at canterbury.ac.nz  Thu Nov  9 00:29:43 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 09 Nov 2006 12:29:43 +1300
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <455168B3.3040809@v.loewis.de>
References: <454CB619.7010804@v.loewis.de> <eiigfo$ft4$2@sea.gmane.org>
	<454D3C9E.5030505@canterbury.ac.nz> <454D5703.5070509@v.loewis.de>
	<454E74ED.8070706@canterbury.ac.nz> <454EDAEA.7050501@v.loewis.de>
	<454FC320.9050604@canterbury.ac.nz> <454FC791.7080106@v.loewis.de>
	<4551350A.60607@canterbury.ac.nz> <455168B3.3040809@v.loewis.de>
Message-ID: <45526867.70004@canterbury.ac.nz>

Martin v. L?wis wrote:

> a) the directory cache is out of date, and you should
>    re-read the directory
> b) the module still isn't there, but is available in
>    a later directory on sys.path (which hasn't yet
>    been visited)
> c) the module isn't there at all, and the import will
>    eventually fail.
> 
> How can the interpreter determine which of these it
> is?

It doesn't need to - if there is no file for the module
in the cache, it assumes that the cache could be out
of date and rebuilds it. If that turns up a file, then
fine, else the module doesn't exist.

BTW, I'm not thinking of cacheing individual directories,
but scanning all the directories and building a single
qualified_module_name -> pathname mapping. If the cache
gets invalidated, all the directories along the path
are re-scanned, so a new module will be picked up
wherever it is on the path.

--
Greg

From martin at v.loewis.de  Thu Nov  9 06:11:13 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Nov 2006 06:11:13 +0100
Subject: [Python-Dev] Importing .pyc in -O mode and vice versa
In-Reply-To: <45526867.70004@canterbury.ac.nz>
References: <454CB619.7010804@v.loewis.de>
	<eiigfo$ft4$2@sea.gmane.org>	<454D3C9E.5030505@canterbury.ac.nz>
	<454D5703.5070509@v.loewis.de>	<454E74ED.8070706@canterbury.ac.nz>
	<454EDAEA.7050501@v.loewis.de>	<454FC320.9050604@canterbury.ac.nz>
	<454FC791.7080106@v.loewis.de>	<4551350A.60607@canterbury.ac.nz>
	<455168B3.3040809@v.loewis.de> <45526867.70004@canterbury.ac.nz>
Message-ID: <4552B871.2010303@v.loewis.de>

Greg Ewing schrieb:
> Martin v. L?wis wrote:
> 
>> a) the directory cache is out of date, and you should
>>    re-read the directory
>> b) the module still isn't there, but is available in
>>    a later directory on sys.path (which hasn't yet
>>    been visited)
>> c) the module isn't there at all, and the import will
>>    eventually fail.
>>
>> How can the interpreter determine which of these it
>> is?
> 
> It doesn't need to - if there is no file for the module
> in the cache, it assumes that the cache could be out
> of date and rebuilds it. If that turns up a file, then
> fine, else the module doesn't exist.

I lost track. I thought we were talking about creating
a cache of directory listings, not a stat cache?

If you invalidate the cache when a file name is not listed,
you will invalidate it on nearly every import, and multiple
times, too: Python looks for foo.py, foo.pyc, foo.so,
foomodule.so. At most one of them is found, the others
aren't. So if foo.so would be found, are you invalidating
the cache because foo.py isn't?

> BTW, I'm not thinking of cacheing individual directories,
> but scanning all the directories and building a single
> qualified_module_name -> pathname mapping. If the cache
> gets invalidated, all the directories along the path
> are re-scanned, so a new module will be picked up
> wherever it is on the path.

That won't work well with path import objects. You
have to observe the order in which sys.path is
scanned, for correct semantics.

Regards,
Martin

From martin at v.loewis.de  Thu Nov  9 06:30:42 2006
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Nov 2006 06:30:42 +0100
Subject: [Python-Dev] Using SCons for cross-compilation
Message-ID: <4552BD02.2090808@v.loewis.de>

Patch #841454 takes a stab at cross-compilation
(for MingW32 on a Linux system, in this case),
and proposes to use SCons instead of setup.py
to compile extension modules. Usage of SCons
would be restricted to cross-compilation (for
the moment).

What do you think?

Regards,
Martin

From anthony at interlink.com.au  Thu Nov  9 07:45:30 2006
From: anthony at interlink.com.au (Anthony Baxter)
Date: Thu, 9 Nov 2006 17:45:30 +1100
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <4552BD02.2090808@v.loewis.de>
References: <4552BD02.2090808@v.loewis.de>
Message-ID: <200611091745.31443.anthony@interlink.com.au>

On Thursday 09 November 2006 16:30, Martin v. L?wis wrote:
> Patch #841454 takes a stab at cross-compilation
> (for MingW32 on a Linux system, in this case),
> and proposes to use SCons instead of setup.py
> to compile extension modules. Usage of SCons
> would be restricted to cross-compilation (for
> the moment).
>
> What do you think?

So we'd now have 3 places to update when things change (setup.py, PCbuild 
area, SCons)? How does this deal with the problems that autoconf has with 
cross-compilation? It would seem to me that just fixing the extension module 
building is a tiny part of the problem... or am I missing something?

Anthony
-- 
Anthony Baxter     <anthony at interlink.com.au>
It's never too late to have a happy childhood.

From amk at amk.ca  Thu Nov  9 15:01:46 2006
From: amk at amk.ca (A.M. Kuchling)
Date: Thu, 9 Nov 2006 09:01:46 -0500
Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk:
	Lib/mailbox.py Misc/NEWS
In-Reply-To: <20061109135115.15FA81E4006@bag.python.org>
References: <20061109135115.15FA81E4006@bag.python.org>
Message-ID: <20061109140146.GB8808@localhost.localdomain>

On Thu, Nov 09, 2006 at 02:51:15PM +0100, andrew.kuchling wrote:
> Author: andrew.kuchling
> Date: Thu Nov  9 14:51:14 2006
> New Revision: 52692
> 
> [Patch #1514544 by David Watson] use fsync() to ensure data is really on disk

Should I backport this change to 2.5.1?  Con: The patch adds two new
internal functions, _sync_flush() and _sync_close(), so it's an
internal API change.  Pro: it's a patch that should reduce chances of
data loss, which is important to people processing mailboxes.

Because it fixes a small chance of potential data loss and the new
functions are prefixed with _, my personal inclination would be to
backport this change.

Comments?  Anthony, do you want to pronounce on this issue?

--amk

From barry at python.org  Thu Nov  9 16:07:23 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Nov 2006 10:07:23 -0500
Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk:
	Lib/mailbox.py Misc/NEWS
In-Reply-To: <20061109140146.GB8808@localhost.localdomain>
References: <20061109135115.15FA81E4006@bag.python.org>
	<20061109140146.GB8808@localhost.localdomain>
Message-ID: <E31A7375-A1A5-4B4B-A50A-1125B65ED657@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 9, 2006, at 9:01 AM, A.M. Kuchling wrote:

> Should I backport this change to 2.5.1?  Con: The patch adds two new
> internal functions, _sync_flush() and _sync_close(), so it's an
> internal API change.  Pro: it's a patch that should reduce chances of
> data loss, which is important to people processing mailboxes.
>
> Because it fixes a small chance of potential data loss and the new
> functions are prefixed with _, my personal inclination would be to
> backport this change.

I agree.  _ is a hint as to its non-publicness and I don't have a  
problem in principle adding such methods.  In this particular case,  
it seems the patch improves reliability, so +1.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRVNEL3EjvBPtnXfVAQIb0QP+Nmd6XPKQeeXaHrAG/fAFjrVHFn4SFkhH
PtJqnLVOAQeSDonDdBQKluypGdWktcpGM/r1mz51cpJhxytYnAbwqeu1LyWJ/maX
ABxG6zrkd7YCjZ5VyK2VQNs2dSLVWYYH24V/xwP5E5D2sEQ80sII3mnydSO+KLVI
HBg9jztsc70=
=Sj0Q
-----END PGP SIGNATURE-----

From skip at pobox.com  Thu Nov  9 16:12:15 2006
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 9 Nov 2006 09:12:15 -0600
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <200611091745.31443.anthony@interlink.com.au>
References: <4552BD02.2090808@v.loewis.de>
	<200611091745.31443.anthony@interlink.com.au>
Message-ID: <17747.17743.429254.590319@montanaro.dyndns.org>

    Anthony> So we'd now have 3 places to update when things change
    Anthony> (setup.py, PCbuild area, SCons)? 

Four.  You forgot Modules/Setup...

Skip

From david at boddie.org.uk  Thu Nov  9 16:42:48 2006
From: david at boddie.org.uk (David Boddie)
Date: Thu, 9 Nov 2006 16:42:48 +0100
Subject: [Python-Dev]  Using SCons for cross-compilation
Message-ID: <200611091642.48998.david@boddie.org.uk>

On Thu Nov 9 07:45:30 CET 2006, Anthony Baxter wrote:

> On Thursday 09 November 2006 16:30, Martin v. L?wis wrote:
> > Patch #841454 takes a stab at cross-compilation
> > (for MingW32 on a Linux system, in this case),
> > and proposes to use SCons instead of setup.py
> > to compile extension modules. Usage of SCons
> > would be restricted to cross-compilation (for
> > the moment).
> >
> > What do you think?
> 
> So we'd now have 3 places to update when things change (setup.py, PCbuild 
> area, SCons)? How does this deal with the problems that autoconf has with 
> cross-compilation? It would seem to me that just fixing the extension module 
> building is a tiny part of the problem... or am I missing something?

I've been working on adding cross-compiling support to Python's build system,
too, though I've had the luxury of building on Linux for a target platform
that also runs Linux. Since the build system originally came from the GCC
project, it shouldn't surprise anyone that there's already a certain level
of support for cross-compilation built in. Simply setting the --build and
--host options is a good start, for example.

It seems that Martin's patch solves some problems I encountered more cleanly
(in certain respects) than the solutions I came up with. Here are some
issues I encountered (from memory):

 * The current system assumes that Parser/pgen will be built using the
   compiler being used for the rest of the build. This obviously isn't
   going to work when the executable is meant for the target platform.
   At the same time, the object files for pgen need to be compiled for
   the interpreter for the target platform.

 * The newly-compiled interpreter is used to compile the standard library,
   run tests and execute the setup.py file. Some of these things should
   be done by the interpreter, but it won't work on the host platform.
   On the other hand, the setup.py script should be run by the host's
   Python interpreter, but using information about the target interpreter's
   configuration.

 * There are various extensions defined in the setup.py file that are
   found and erroneously included if you execute it using the host's
   interpreter. Ideally, it would be possible to use the target's
   configuration to disable extensions, but a more configurable build
   process would also be welcome.

I'll try to look at Martin's patch at some point. I hope these observations
and suggestions help explain the current issues with the build system when
cross-compiling.

David

From chris at kateandchris.net  Thu Nov  9 17:29:37 2006
From: chris at kateandchris.net (Chris Lambacher)
Date: Thu, 9 Nov 2006 11:29:37 -0500
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <200611091642.48998.david@boddie.org.uk>
References: <200611091642.48998.david@boddie.org.uk>
Message-ID: <20061109162937.GA3812@kateandchris.net>

On Thu, Nov 09, 2006 at 04:42:48PM +0100, David Boddie wrote:
> On Thu Nov 9 07:45:30 CET 2006, Anthony Baxter wrote:
> 
> > On Thursday 09 November 2006 16:30, Martin v. L?wis wrote:
> > > Patch #841454 takes a stab at cross-compilation
> > > (for MingW32 on a Linux system, in this case),
> > > and proposes to use SCons instead of setup.py
> > > to compile extension modules. Usage of SCons
> > > would be restricted to cross-compilation (for
> > > the moment).
> > >
> > > What do you think?
> > 
> > So we'd now have 3 places to update when things change (setup.py, PCbuild 
> > area, SCons)? How does this deal with the problems that autoconf has with 
> > cross-compilation? It would seem to me that just fixing the extension module 
> > building is a tiny part of the problem... or am I missing something?
> 
> I've been working on adding cross-compiling support to Python's build system,
> too, though I've had the luxury of building on Linux for a target platform
> that also runs Linux. Since the build system originally came from the GCC
> project, it shouldn't surprise anyone that there's already a certain level
> of support for cross-compilation built in. Simply setting the --build and
> --host options is a good start, for example.
> 
> It seems that Martin's patch solves some problems I encountered more cleanly
> (in certain respects) than the solutions I came up with. Here are some
> issues I encountered (from memory):
> 
>  * The current system assumes that Parser/pgen will be built using the
>    compiler being used for the rest of the build. This obviously isn't
>    going to work when the executable is meant for the target platform.
>    At the same time, the object files for pgen need to be compiled for
>    the interpreter for the target platform.
> 
>  * The newly-compiled interpreter is used to compile the standard library,
>    run tests and execute the setup.py file. Some of these things should
>    be done by the interpreter, but it won't work on the host platform.
>    On the other hand, the setup.py script should be run by the host's
>    Python interpreter, but using information about the target interpreter's
>    configuration.
> 
>  * There are various extensions defined in the setup.py file that are
>    found and erroneously included if you execute it using the host's
>    interpreter. Ideally, it would be possible to use the target's
>    configuration to disable extensions, but a more configurable build
>    process would also be welcome.
> 
This pretty much covers the difficulties I encountered.  For what it's worth,
my experiences with Python 2.5 are documented here:
<http://whatschrisdoing.com/blog/2006/10/06/howto-cross-compile-python-25/>

I am also interested in pursuing solutions that make it easier to both build
python and third party extensions in cross compile environment.

> I'll try to look at Martin's patch at some point. I hope these observations
> and suggestions help explain the current issues with the build system when
> cross-compiling.
> 
> David
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/chris%40kateandchris.net

From tjreedy at udel.edu  Thu Nov  9 19:54:00 2006
From: tjreedy at udel.edu (Terry Reedy)
Date: Thu, 9 Nov 2006 13:54:00 -0500
Subject: [Python-Dev] [Python-checkins] r52692 - in
	python/trunk:Lib/mailbox.py Misc/NEWS
References: <20061109135115.15FA81E4006@bag.python.org>
	<20061109140146.GB8808@localhost.localdomain>
Message-ID: <eivtg7$l33$1@sea.gmane.org>


"A.M. Kuchling" <amk at amk.ca> wrote in message 
news:20061109140146.GB8808 at localhost.localdomain...
> On Thu, Nov 09, 2006 at 02:51:15PM +0100, andrew.kuchling wrote:
>> Author: andrew.kuchling
>> Date: Thu Nov  9 14:51:14 2006
>> New Revision: 52692
>>
>> [Patch #1514544 by David Watson] use fsync() to ensure data is really on 
>> disk
>
> Should I backport this change to 2.5.1?  Con: The patch adds two new
> internal functions, _sync_flush() and _sync_close(), so it's an
> internal API change.  Pro: it's a patch that should reduce chances of
> data loss, which is important to people processing mailboxes.

I am not familiar with the context but I would naively think of data loss 
as a bug.

The new functions' code could be preceded by a comment that they were added 
in 2.5.1 for internal use only and that external use would make code 
incompatible with 2.5 -- and of course, not documented elsewhere.

tjr


From martin at v.loewis.de  Thu Nov  9 20:02:12 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Nov 2006 20:02:12 +0100
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <200611091745.31443.anthony@interlink.com.au>
References: <4552BD02.2090808@v.loewis.de>
	<200611091745.31443.anthony@interlink.com.au>
Message-ID: <45537B34.9010100@v.loewis.de>

Anthony Baxter schrieb:
> So we'd now have 3 places to update when things change (setup.py, PCbuild 
> area, SCons)? How does this deal with the problems that autoconf has with 
> cross-compilation? It would seem to me that just fixing the extension module 
> building is a tiny part of the problem... or am I missing something?

I'm not quite sure. I believe distutils is too smart to support
cross-compilation. It has its own notion of where to look for
header files and how to invoke the compiler; these builtin
assumptions break for cross-compilation.

In any case, the patch being contributed uses SCons. If people
think this is unmaintainable, this is a reason to reject the
patch.

Regards,
Martin

From skip at pobox.com  Thu Nov  9 20:15:15 2006
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 9 Nov 2006 13:15:15 -0600
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <45537B34.9010100@v.loewis.de>
References: <4552BD02.2090808@v.loewis.de>
	<200611091745.31443.anthony@interlink.com.au>
	<45537B34.9010100@v.loewis.de>
Message-ID: <17747.32323.67824.681099@montanaro.dyndns.org>


    Martin> In any case, the patch being contributed uses SCons. If people
    Martin> think this is unmaintainable, this is a reason to reject the
    Martin> patch.

Could SCons replace distutils?

Skip

From chris at kateandchris.net  Thu Nov  9 20:27:17 2006
From: chris at kateandchris.net (Chris Lambacher)
Date: Thu, 9 Nov 2006 14:27:17 -0500
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <17747.32323.67824.681099@montanaro.dyndns.org>
References: <4552BD02.2090808@v.loewis.de>
	<200611091745.31443.anthony@interlink.com.au>
	<45537B34.9010100@v.loewis.de>
	<17747.32323.67824.681099@montanaro.dyndns.org>
Message-ID: <20061109192717.GA4353@kateandchris.net>

On Thu, Nov 09, 2006 at 01:15:15PM -0600, skip at pobox.com wrote:
> 
>     Martin> In any case, the patch being contributed uses SCons. If people
>     Martin> think this is unmaintainable, this is a reason to reject the
>     Martin> patch.
> 
> Could SCons replace distutils?
If SCons replaced Distutils would SCons have to become part of Python?  Is
SCons ready for that?  What do you do about the existing body 3rd party
extensions that are already using Distutils?  

Think of the resistance to the, relatively minor, that Setuptools made to the
way Distutils works.

I think a better question is what about Distutils hinders cross-compiler
scenarios and how to we fix those deficiencies?

-Chris

From martin at v.loewis.de  Thu Nov  9 20:50:51 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Nov 2006 20:50:51 +0100
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <20061109192717.GA4353@kateandchris.net>
References: <4552BD02.2090808@v.loewis.de>
	<200611091745.31443.anthony@interlink.com.au>
	<45537B34.9010100@v.loewis.de>
	<17747.32323.67824.681099@montanaro.dyndns.org>
	<20061109192717.GA4353@kateandchris.net>
Message-ID: <4553869B.8060203@v.loewis.de>

Chris Lambacher schrieb:
> I think a better question is what about Distutils hinders cross-compiler
> scenarios and how to we fix those deficiencies?

It's primarily the lack of contributions. Somebody would have to define
a cross-compilation scenario (where "use Cygwin on Linux" is one that
might be available to many people), and try to make it work.

I believe it wouldn't work out of the box because distutils issues
the wrong commands with the wrong command line options. But I don't
know for sure; I haven't tried myself.

Regards,
Martin

From skip at pobox.com  Thu Nov  9 20:54:37 2006
From: skip at pobox.com (skip at pobox.com)
Date: Thu, 9 Nov 2006 13:54:37 -0600
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <20061109192717.GA4353@kateandchris.net>
References: <4552BD02.2090808@v.loewis.de>
	<200611091745.31443.anthony@interlink.com.au>
	<45537B34.9010100@v.loewis.de>
	<17747.32323.67824.681099@montanaro.dyndns.org>
	<20061109192717.GA4353@kateandchris.net>
Message-ID: <17747.34685.755754.32319@montanaro.dyndns.org>


    >> Could SCons replace distutils?

    Chris> If SCons replaced Distutils would SCons have to become part of
    Chris> Python?  Is SCons ready for that?  What do you do about the
    Chris> existing body 3rd party extensions that are already using
    Chris> Distutils?

Sorry, my question was ambiguous.  Let me rephrase it: Could SCons replace
distutils as the way to build extension modules delivered with Python
proper?

In answer to your quesions:

 * Yes, I believe so.

 * I have no idea what SCons is ready for.

 * I assume distutils would continue to ship with Python, so existing
   distutils-based setup.py install scripts should continue to work.

Someone (I don't know who) submitted a patch to use SCons for building
modules in cross-compilation contexts.  Either the author tried to shoehorn
this into distutils and failed or never tried (maybe because using SCons for
such takss is much easier - who knows?).  I assume that if the patch is
accepted that SCons would have to be bundled with Python.  I don't see that
as a big problem as long as there's someone to support it and it meets the
basic requirements for inclusion (significant user base, documentation, test
cases, release form).  Given that SCons can apparently be coaxed into
cross-compiling extension modules, I presume it should be relatively simple
to do the same in a normal compilation environment.  If that's the case,
then why use distutils to build Python's core extension modules at all?

Skip

From martin at v.loewis.de  Thu Nov  9 20:56:04 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Nov 2006 20:56:04 +0100
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <200611091642.48998.david@boddie.org.uk>
References: <200611091642.48998.david@boddie.org.uk>
Message-ID: <455387D4.20206@v.loewis.de>

David Boddie schrieb:
> It seems that Martin's patch solves some problems I encountered more cleanly
> (in certain respects) than the solutions I came up with. Here are some
> issues I encountered (from memory):

Just let me point out that it is not my patch:

http://python.org/sf/841454

was contributed by Andreas Ames. I performed triage on it (as it is
about to reach its 3rd anniversary), and view SCons usage as the biggest
obstacle.

Regards,
Martin

From martin at v.loewis.de  Thu Nov  9 20:59:23 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 09 Nov 2006 20:59:23 +0100
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <17747.34685.755754.32319@montanaro.dyndns.org>
References: <4552BD02.2090808@v.loewis.de>	<200611091745.31443.anthony@interlink.com.au>	<45537B34.9010100@v.loewis.de>	<17747.32323.67824.681099@montanaro.dyndns.org>	<20061109192717.GA4353@kateandchris.net>
	<17747.34685.755754.32319@montanaro.dyndns.org>
Message-ID: <4553889B.7050404@v.loewis.de>

skip at pobox.com schrieb:
> Someone (I don't know who) submitted a patch to use SCons for building
> modules in cross-compilation contexts.  Either the author tried to shoehorn
> this into distutils and failed or never tried (maybe because using SCons for
> such takss is much easier - who knows?).  I assume that if the patch is
> accepted that SCons would have to be bundled with Python.

I don't see that as a requirement. People cross-compiling Python could
be required to install SCons - they are used to install all kinds of
things for a cross-compilation environment.

In particular, to run SCons, they need a host python. The just-built
python is unsuitable, as it only runs on the target.

Regards,
Martin

From barry at python.org  Thu Nov  9 22:19:19 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 9 Nov 2006 16:19:19 -0500
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <17747.32323.67824.681099@montanaro.dyndns.org>
References: <4552BD02.2090808@v.loewis.de>
	<200611091745.31443.anthony@interlink.com.au>
	<45537B34.9010100@v.loewis.de>
	<17747.32323.67824.681099@montanaro.dyndns.org>
Message-ID: <BF1F720A-1845-4B35-8EE4-E2A55394C5D0@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 9, 2006, at 2:15 PM, skip at pobox.com wrote:

>
>     Martin> In any case, the patch being contributed uses SCons. If  
> people
>     Martin> think this is unmaintainable, this is a reason to  
> reject the
>     Martin> patch.
>
> Could SCons replace distutils?

I'm not so sure.  I love SCons, but it has some unpythonic aspects to  
it, which (IMO) make sense as a standalone build tool, but not so  
much as a standard library module.  I'd probably want to see some of  
those things improved if we were to use it to replace distutils.

There does seem to be overlap between the two tools though, and it  
might make for an interesting sprint/project to find and refactor the  
commonality.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRVObXHEjvBPtnXfVAQIhQQP/esS6o+7NX/JenJcuEdvb7rWIVxRgzVEh
rfZGSOO2mp6b0PgrvXjAnZQHYJFpQO5JXpWJVqLBPxbucbBwvWaA0+tgTrpnBpj9
Cs/vwlMsmk55CwSYjvl7eM0uW9aIuT9QcZxuf4j+T7dzQOL0LL2Id4/876Azcfo0
7A0dtc2oJ+U=
=H1w2
-----END PGP SIGNATURE-----

From pedronis at strakt.com  Thu Nov  9 22:29:14 2006
From: pedronis at strakt.com (Samuele Pedroni)
Date: Thu, 09 Nov 2006 22:29:14 +0100
Subject: [Python-Dev] Using SCons for cross-compilation
In-Reply-To: <BF1F720A-1845-4B35-8EE4-E2A55394C5D0@python.org>
References: <4552BD02.2090808@v.loewis.de>	<200611091745.31443.anthony@interlink.com.au>	<45537B34.9010100@v.loewis.de>	<17747.32323.67824.681099@montanaro.dyndns.org>
	<BF1F720A-1845-4B35-8EE4-E2A55394C5D0@python.org>
Message-ID: <45539DAA.7070701@strakt.com>

Barry Warsaw wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Nov 9, 2006, at 2:15 PM, skip at pobox.com wrote:
>
>   
>>     Martin> In any case, the patch being contributed uses SCons. If  
>> people
>>     Martin> think this is unmaintainable, this is a reason to  
>> reject the
>>     Martin> patch.
>>
>> Could SCons replace distutils?
>>     
>
> I'm not so sure.  I love SCons, but it has some unpythonic aspects to  
> it, which (IMO) make sense as a standalone build tool, but not so  
> much as a standard library module.  I'd probably want to see some of  
> those things improved if we were to use it to replace distutils.
>
>   
in PyPy we explored at some point using SCons instead of abusing 
distutils for our building needs,
it seems to have a library part but a lot of its high-level dependency 
logic seems to be coded
in what is its main invocation script logic in a monolithic way and with 
a lot of global state.
We didn't feel like trying to untangle that or explore more.
> There does seem to be overlap between the two tools though, and it  
> might make for an interesting sprint/project to find and refactor the  
> commonality.
>
> - -Barry
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (Darwin)
>
> iQCVAwUBRVObXHEjvBPtnXfVAQIhQQP/esS6o+7NX/JenJcuEdvb7rWIVxRgzVEh
> rfZGSOO2mp6b0PgrvXjAnZQHYJFpQO5JXpWJVqLBPxbucbBwvWaA0+tgTrpnBpj9
> Cs/vwlMsmk55CwSYjvl7eM0uW9aIuT9QcZxuf4j+T7dzQOL0LL2Id4/876Azcfo0
> 7A0dtc2oJ+U=
> =H1w2
> -----END PGP SIGNATURE-----
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/pedronis%40strakt.com
>   


From anthony at interlink.com.au  Fri Nov 10 01:56:25 2006
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri, 10 Nov 2006 11:56:25 +1100
Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk:
	Lib/mailbox.py Misc/NEWS
In-Reply-To: <20061109140146.GB8808@localhost.localdomain>
References: <20061109135115.15FA81E4006@bag.python.org>
	<20061109140146.GB8808@localhost.localdomain>
Message-ID: <200611101156.29325.anthony@interlink.com.au>

On Friday 10 November 2006 01:01, A.M. Kuchling wrote:
> On Thu, Nov 09, 2006 at 02:51:15PM +0100, andrew.kuchling wrote:
> > Author: andrew.kuchling
> > Date: Thu Nov  9 14:51:14 2006
> > New Revision: 52692
> >
> > [Patch #1514544 by David Watson] use fsync() to ensure data is really on
> > disk
>
> Should I backport this change to 2.5.1?  Con: The patch adds two new
> internal functions, _sync_flush() and _sync_close(), so it's an
> internal API change.  Pro: it's a patch that should reduce chances of
> data loss, which is important to people processing mailboxes.
>
> Because it fixes a small chance of potential data loss and the new
> functions are prefixed with _, my personal inclination would be to
> backport this change.

Looking at the patch, the functions are pretty clearly internal implementation 
details. I'm happy for it to go into release25-maint (particularly because 
the consequences of the bug are so dire).

Anthony
-- 
Anthony Baxter     <anthony at interlink.com.au>
It's never too late to have a happy childhood.

From amk at amk.ca  Fri Nov 10 03:45:13 2006
From: amk at amk.ca (A.M. Kuchling)
Date: Thu, 9 Nov 2006 21:45:13 -0500
Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk:
	Lib/mailbox.py Misc/NEWS
In-Reply-To: <200611101156.29325.anthony@interlink.com.au>
References: <20061109135115.15FA81E4006@bag.python.org>
	<20061109140146.GB8808@localhost.localdomain>
	<200611101156.29325.anthony@interlink.com.au>
Message-ID: <20061110024513.GB1739@Andrew-iBook2.local>

On Fri, Nov 10, 2006 at 11:56:25AM +1100, Anthony Baxter wrote:
> Looking at the patch, the functions are pretty clearly internal implementation 
> details. I'm happy for it to go into release25-maint (particularly because 
> the consequences of the bug are so dire).

OK, I'll backport it; thanks!

(It's not fixing a frequent data-loss problem -- the patch just
assures that when flush() or close() returns, data is more likely to
have been written to disk and be safe after a subsequent system
crash.)

--amk

From anthony at interlink.com.au  Fri Nov 10 04:36:52 2006
From: anthony at interlink.com.au (Anthony Baxter)
Date: Fri, 10 Nov 2006 14:36:52 +1100
Subject: [Python-Dev] [Python-checkins] r52692 - in python/trunk:
	Lib/mailbox.py Misc/NEWS
In-Reply-To: <20061110024513.GB1739@Andrew-iBook2.local>
References: <20061109135115.15FA81E4006@bag.python.org>
	<200611101156.29325.anthony@interlink.com.au>
	<20061110024513.GB1739@Andrew-iBook2.local>
Message-ID: <200611101436.52930.anthony@interlink.com.au>

On Friday 10 November 2006 13:45, A.M. Kuchling wrote:
> OK, I'll backport it; thanks!
>
> (It's not fixing a frequent data-loss problem -- the patch just
> assures that when flush() or close() returns, data is more likely to
> have been written to disk and be safe after a subsequent system
> crash.)

Sure - it's a potential bug waiting to happen, though. And it's not a fun 
one :) 


From paul.chiusano at gmail.com  Sun Nov  5 18:36:35 2006
From: paul.chiusano at gmail.com (Paul Chiusano)
Date: Sun, 5 Nov 2006 12:36:35 -0500
Subject: [Python-Dev] Status of pairing_heap.py?
In-Reply-To: <20061104122150.81FF.JCARLSON@uci.edu>
References: <a9af71b70611041018x845307do2ca01c60c5189fac@mail.gmail.com>
	<454CE367.7000604@v.loewis.de> <20061104122150.81FF.JCARLSON@uci.edu>
Message-ID: <a9af71b70611050936l1fb09277mf7b4537c88143c75@mail.gmail.com>

> It is not required.  If you are careful, you can implement a pairing
> heap with a structure combining a dictionary and list.

That's interesting. Can you give an overview of how you can do that? I
can't really picture it. You can support all the pairing heap
operations with the same complexity guarantees? Do you mean a linked
list here or an array?

Paul

On 11/4/06, Josiah Carlson <jcarlson at uci.edu> wrote:
>
> "Martin v. L?wis" <martin at v.loewis.de> wrote:
> > Paul Chiusano schrieb:
> > > To support this, the insert method needs to return a reference to an
> > > object which I can then pass to adjust_key() and delete() methods.
> > > It's extremely difficult to have this functionality with array-based
> > > heaps because the index of an item in the array changes as items are
> > > inserted and removed.
> >
> > I see.
>
> It is not required.  If you are careful, you can implement a pairing
> heap with a structure combining a dictionary and list.  It requires that
> all values be unique and hashable, but it is possible (I developed one
> for a commercial project).
>
> If other people find the need for it, I could rewrite it (can't release
> the closed source).  It would use far less memory than the pairing heap
> implementation provided in the sandbox, and could be converted to C if
> desired and/or required.  On the other hand, I've found the pure Python
> version to be fast enough for most things I've needed it for.
>
>  - Josiah
>
>

From kxroberto at googlemail.com  Mon Nov  6 17:56:02 2006
From: kxroberto at googlemail.com (Robert)
Date: Mon, 06 Nov 2006 17:56:02 +0100
Subject: [Python-Dev] Feature Request: Py_NewInterpreter to create separate
 GIL (branch)
References: ca471dc20611052052s2cfe3461l7265b7a2aeae5b3@mail.gmail.com
Message-ID: <454F6922.20503@googlemail.com>

Talin wrote:

>>/ I don't know how you define simple. In order to be able to have
/>>/ separate GILs  you have to remove *all* sharing of objects between
/>>/ interpreters. And all other data structures, too. It would probably
/>>/ kill performance too, because currently obmalloc relies on the GIL.
/
> Nitpick: You have to remove all sharing of *mutable* objects. One day, 
> when we get "pure" GC with no refcounting, that will be a meaningful 
> distinction. :)

Is it mad?:

It could be a distinction now: immutables/singletons refcount could be held ~fix around MAXINT easily (by a loose periodic GC scheme, or by Py_INC/DEFREF to be like { if ob.refcount!=MAXINT ... )

dicty things like Exception.x=5 could either be disabled or Exception.refcount=MAXINT/.__dict__=lockingdict ... or exceptions could be doubled as they don't have to cross the bridge (weren't they in an ordinary python module once ?).

obmalloc.c/LOCK() could be something fast like:

_retry:
  __asm   LOCK INC malloc_lock
  if (malloc_lock!=1) { LOCK DEC malloc_lock; /*yield();*/ goto _retry; }

To know the final speed costs ( http://groups.google.de/group/comp.lang.python/msg/01cef42159fd1712 ) would require an experiment.
Cheap signal processors (<1%) don't need to be supported for free threading interpreters.

Builtin/Extension modules global __dict__ to become a lockingdict. 

Yet a speedy LOCK INC lock method may possibly lead to general free threading threads (for most CPUs) at all. Almost all Python objects have static/uncritical attributes/require only few locks.
A full blown LOCK INC lock method on dict & list accesses, (avoidable for fastlocals?) & defaulty Py_INC/DECREF (as far as there is still refcounting in Py3K).
Py_FASTINCREF could be fast for known immutables (mainly Py_None) with MAXINT method, and for fresh creations etc.

PyThreadState_GET(): A ts(PyThread_get_thread_ident())/*TlsGetValue() would become necessary. Is there a fast thread_ID register in todays CPU's?*


Robert


From fredrik at pythonware.com  Fri Nov 10 18:21:35 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 10 Nov 2006 18:21:35 +0100
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>	<454FBDC3.3060100@gmail.com>
	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
Message-ID: <ej2cev$fdo$2@sea.gmane.org>

Guido van Rossum wrote:

> No objection on targetting 2.6 if other developers agree. Seems this
> is well under way. good work!

given that dir() is used extensively by introspection tools, I'm
not sure I'm positive to a __dir__ that *overrides* the standard
dir() behaviour.  *adding* to the default dir() list is okay, re- 
placing it is a lot more questionable.

(what about vars(), btw?)

</F>


From guido at python.org  Fri Nov 10 20:30:57 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Nov 2006 11:30:57 -0800
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ej2cev$fdo$2@sea.gmane.org>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>
	<454FBDC3.3060100@gmail.com>
	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
	<ej2cev$fdo$2@sea.gmane.org>
Message-ID: <ca471dc20611101130r6c919c17x76ae141ca00211b7@mail.gmail.com>

On 11/10/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Guido van Rossum wrote:
>
> > No objection on targetting 2.6 if other developers agree. Seems this
> > is well under way. good work!
>
> given that dir() is used extensively by introspection tools, I'm
> not sure I'm positive to a __dir__ that *overrides* the standard
> dir() behaviour.  *adding* to the default dir() list is okay, re-
> placing it is a lot more questionable.

I think that ought to go into the guidlines for what's an acceptable
__dir__ implementation. We don't try to stop people from overriding
__add__ as subtraction either.

> (what about vars(), btw?)

Interesting question! Right now vars() and dir() don't seem to use the
same set of keys; e.g.:

>>> class C: pass
...
>>> c = C()
>>> c.foo = 42
>>> vars(c)
{'foo': 42}
>>> dir(c)
['__doc__', '__module__', 'foo']
>>>

It makes some sense for vars(x) to return something like

  dict((name, getattr(x, name)) for name in dir(x) if hasattr(x, name))

and for the following equivalence to hold between vars() and dir() without args:

  dir() == sorted(vars().keys())

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From theller at ctypes.org  Fri Nov 10 20:41:28 2006
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 10 Nov 2006 20:41:28 +0100
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ej2cev$fdo$2@sea.gmane.org>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>	<454FBDC3.3060100@gmail.com>	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
	<ej2cev$fdo$2@sea.gmane.org>
Message-ID: <4554D5E8.3020600@ctypes.org>

Fredrik Lundh schrieb:
> Guido van Rossum wrote:
> 
>> No objection on targetting 2.6 if other developers agree. Seems this
>> is well under way. good work!
> 
> given that dir() is used extensively by introspection tools, I'm
> not sure I'm positive to a __dir__ that *overrides* the standard
> dir() behaviour.  *adding* to the default dir() list is okay, re- 
> placing it is a lot more questionable.

One part that *I* would like about a complete overridable __dir__ implementation
is that it would be nice to customize what help(something) prints.

Thomas


From fredrik at pythonware.com  Fri Nov 10 21:25:02 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 10 Nov 2006 21:25:02 +0100
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <4554D5E8.3020600@ctypes.org>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>	<454FBDC3.3060100@gmail.com>	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>	<ej2cev$fdo$2@sea.gmane.org>
	<4554D5E8.3020600@ctypes.org>
Message-ID: <ej2n6v$nrv$1@sea.gmane.org>

Thomas Heller wrote:

>>> No objection on targetting 2.6 if other developers agree. Seems this
>>> is well under way. good work!
 >>
>> given that dir() is used extensively by introspection tools, I'm
>> not sure I'm positive to a __dir__ that *overrides* the standard
>> dir() behaviour.  *adding* to the default dir() list is okay, re- 
>> placing it is a lot more questionable.
> 
> One part that *I* would like about a complete overridable __dir__ implementation
> is that it would be nice to customize what help(something) prints.

I don't think you should confuse reliable introspection with the help 
system, though.  introspection is used for a lot more than implementing 
help().

</F>


From fredrik at pythonware.com  Fri Nov 10 21:26:34 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Fri, 10 Nov 2006 21:26:34 +0100
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ca471dc20611101130r6c919c17x76ae141ca00211b7@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>	<454FBDC3.3060100@gmail.com>	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>	<ej2cev$fdo$2@sea.gmane.org>
	<ca471dc20611101130r6c919c17x76ae141ca00211b7@mail.gmail.com>
Message-ID: <ej2n9q$nrv$2@sea.gmane.org>

Guido van Rossum wrote:

> I think that ought to go into the guidlines for what's an acceptable
> __dir__ implementation. We don't try to stop people from overriding
> __add__ as subtraction either.

to me, overriding dir() is a lot more like overriding id() than over- 
riding "+".  I don't think an object should be allowed to lie to the 
introspection mechanisms.

</F>


From guido at python.org  Fri Nov 10 22:12:19 2006
From: guido at python.org (Guido van Rossum)
Date: Fri, 10 Nov 2006 13:12:19 -0800
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ej2n9q$nrv$2@sea.gmane.org>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>
	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>
	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>
	<454FBDC3.3060100@gmail.com>
	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
	<ej2cev$fdo$2@sea.gmane.org>
	<ca471dc20611101130r6c919c17x76ae141ca00211b7@mail.gmail.com>
	<ej2n9q$nrv$2@sea.gmane.org>
Message-ID: <ca471dc20611101312k5c17d564i8dde96f563b7d2c9@mail.gmail.com>

On 11/10/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Guido van Rossum wrote:
>
> > I think that ought to go into the guidlines for what's an acceptable
> > __dir__ implementation. We don't try to stop people from overriding
> > __add__ as subtraction either.
>
> to me, overriding dir() is a lot more like overriding id() than over-
> riding "+".  I don't think an object should be allowed to lie to the
> introspection mechanisms.

Why not? You can override __class__ already. With a metaclass you can
probably override inspection of the class, too.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From g.brandl at gmx.net  Sat Nov 11 11:20:41 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 11 Nov 2006 11:20:41 +0100
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ca471dc20611101130r6c919c17x76ae141ca00211b7@mail.gmail.com>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>	<454FBDC3.3060100@gmail.com>	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>	<ej2cev$fdo$2@sea.gmane.org>
	<ca471dc20611101130r6c919c17x76ae141ca00211b7@mail.gmail.com>
Message-ID: <ej485s$eoc$1@sea.gmane.org>

Guido van Rossum wrote:

>> (what about vars(), btw?)
> 
> Interesting question! Right now vars() and dir() don't seem to use the
> same set of keys; e.g.:
> 
>>>> class C: pass
> ...
>>>> c = C()
>>>> c.foo = 42
>>>> vars(c)
> {'foo': 42}
>>>> dir(c)
> ['__doc__', '__module__', 'foo']
>>>>
> 
> It makes some sense for vars(x) to return something like
> 
>   dict((name, getattr(x, name)) for name in dir(x) if hasattr(x, name))
> 
> and for the following equivalence to hold between vars() and dir() without args:
> 
>   dir() == sorted(vars().keys())

+1. This is easy and straightforward to explain, better than
"With a module, class or class instance object as argument (or anything else 
that has a __dict__  attribute), returns a dictionary corresponding to the 
object's symbol table."

Georg


From g.brandl at gmx.net  Sat Nov 11 11:21:08 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Sat, 11 Nov 2006 11:21:08 +0100
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ej2cev$fdo$2@sea.gmane.org>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>	<454FBDC3.3060100@gmail.com>	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
	<ej2cev$fdo$2@sea.gmane.org>
Message-ID: <ej486p$eoc$2@sea.gmane.org>

Fredrik Lundh wrote:
> Guido van Rossum wrote:
> 
>> No objection on targetting 2.6 if other developers agree. Seems this
>> is well under way. good work!
> 
> given that dir() is used extensively by introspection tools, I'm
> not sure I'm positive to a __dir__ that *overrides* the standard
> dir() behaviour.  *adding* to the default dir() list is okay, re- 
> placing it is a lot more questionable.

If the new default __dir__ implementation only yields the same set
of attributes (or more), there should be no problem.

If somebody overrides __dir__, he knows what he's doing. He will most
likely do something like "return super.__dir__() + [my, custom, attributes]".

regards,
Georg


From ncoghlan at gmail.com  Sun Nov 12 04:34:29 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Sun, 12 Nov 2006 13:34:29 +1000
Subject: [Python-Dev] __dir__, part 2
In-Reply-To: <ej2cev$fdo$2@sea.gmane.org>
References: <1d85506f0611060702h1795645cq777cceaf6e453246@mail.gmail.com>	<ca471dc20611060807u1e34095ekb8f71b70300b8577@mail.gmail.com>	<1d85506f0611061355g223839fev3764b8f05caa81fd@mail.gmail.com>	<454FBDC3.3060100@gmail.com>	<ca471dc20611070000k95153d2ob215555f1c01dc9a@mail.gmail.com>
	<ej2cev$fdo$2@sea.gmane.org>
Message-ID: <45569645.1080206@gmail.com>

Fredrik Lundh wrote:
> Guido van Rossum wrote:
> 
>> No objection on targetting 2.6 if other developers agree. Seems this
>> is well under way. good work!
> 
> given that dir() is used extensively by introspection tools, I'm
> not sure I'm positive to a __dir__ that *overrides* the standard
> dir() behaviour.  *adding* to the default dir() list is okay, re- 
> placing it is a lot more questionable.

If a class only overrides __getattr__, then I agree it should only add to 
__dir__ (most likely by using a super call as Georg suggests).

If it overrides __getattribute__, however, then it can actually deliberately 
block access to attributes that would otherwise be accessible, so it may make 
sense for it to alter the basic result of dir() instead of just adding more 
attributes to the end.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From martin at v.loewis.de  Sun Nov 12 12:01:20 2006
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Nov 2006 12:01:20 +0100
Subject: [Python-Dev] Passing floats to file.seek
Message-ID: <4556FF00.3070108@v.loewis.de>

Patch #1067760 deals with passing of float values to file.seek;
the original version tries to fix the current implementation
by converting floats to long long, rather than plain C long
(thus supporting files larger than 2GiB).

I propose a different approach: passing floats to seek should
be an error. My version of the patch uses the index API, this
will automatically give an error.

Two questions:
a) should floats be supported as parameters to file.seek
b) if not, should Python 2.6 just deprecate such usage,
   or outright reject it?

Regards,
Martin

From fredrik at pythonware.com  Sun Nov 12 12:09:49 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 12 Nov 2006 12:09:49 +0100
Subject: [Python-Dev] Passing floats to file.seek
In-Reply-To: <4556FF00.3070108@v.loewis.de>
References: <4556FF00.3070108@v.loewis.de>
Message-ID: <ej6ve0$bp$1@sea.gmane.org>

Martin v. L?wis wrote:

> Patch #1067760 deals with passing of float values to file.seek;
> the original version tries to fix the current implementation
> by converting floats to long long, rather than plain C long
> (thus supporting files larger than 2GiB).
> 
> I propose a different approach: passing floats to seek should
> be an error. My version of the patch uses the index API, this
> will automatically give an error.
> 
> Two questions:
> a) should floats be supported as parameters to file.seek

I don't really see why.

> b) if not, should Python 2.6 just deprecate such usage,
>    or outright reject it?

Python 2.5 silently accepts (and truncates) a float that's within range, 
so a warning sounds like the right thing to do for 2.6.  note that read 
already produces such a warning:

 >>> f = open("hello.txt")
 >>> f.seek(1.5)
 >>> f.read(1.5)
__main__:1: DeprecationWarning: integer argument expected, got float
'e'

</F>


From anthony at interlink.com.au  Sun Nov 12 16:23:08 2006
From: anthony at interlink.com.au (Anthony Baxter)
Date: Mon, 13 Nov 2006 02:23:08 +1100
Subject: [Python-Dev] Passing floats to file.seek
In-Reply-To: <ej6ve0$bp$1@sea.gmane.org>
References: <4556FF00.3070108@v.loewis.de> <ej6ve0$bp$1@sea.gmane.org>
Message-ID: <200611130223.11460.anthony@interlink.com.au>

On Sunday 12 November 2006 22:09, Fredrik Lundh wrote:
> Martin v. L?wis wrote:
> > Patch #1067760 deals with passing of float values to file.seek;
> > the original version tries to fix the current implementation
> > by converting floats to long long, rather than plain C long
> > (thus supporting files larger than 2GiB).

> > b) if not, should Python 2.6 just deprecate such usage,
> >    or outright reject it?
>
> Python 2.5 silently accepts (and truncates) a float that's within range,
> so a warning sounds like the right thing to do for 2.6.  note that read

I agree that a warning seems best. If someone (for whatever reason) is 
flinging floats around where they actually meant to have ints, going straight 
to an error from silently truncating and accepting it seems a little bit 
harsh. 

Anthony
-- 
Anthony Baxter     <anthony at interlink.com.au>
It's never too late to have a happy childhood.

From fredrik at pythonware.com  Sun Nov 12 18:47:16 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 12 Nov 2006 18:47:16 +0100
Subject: [Python-Dev] ready-made timezones for the datetime module
Message-ID: <ej7mn8$vrq$1@sea.gmane.org>

I guess I should remember, but what's the rationale for not including 
even a single concrete "tzinfo" implementation in the standard library?

not even a UTC class?

or am I missing something?

</F>


From martin at v.loewis.de  Sun Nov 12 20:14:47 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 12 Nov 2006 20:14:47 +0100
Subject: [Python-Dev] ready-made timezones for the datetime module
In-Reply-To: <ej7mn8$vrq$1@sea.gmane.org>
References: <ej7mn8$vrq$1@sea.gmane.org>
Message-ID: <455772A7.6090800@v.loewis.de>

Fredrik Lundh schrieb:
> I guess I should remember, but what's the rationale for not including 
> even a single concrete "tzinfo" implementation in the standard library?
> 
> not even a UTC class?
> 
> or am I missing something?

If you are asking for a time-zone database, such as pytz
(http://sourceforge.net/projects/pytz/), then I think there
are two reasons for why no such code is included:

a) such a database is not available in standard C, or even
   in POSIX. So it is not possible to provide this functionality
   by wrapping a widely-available library.

b) no code to provide such functionality has been contributed.

Normally, b) would be the bigger issue. In this case, I think
there might also be resistance to including a large database
(as usual when inclusion of some database is proposed).

Regards,
Martin

From fredrik at pythonware.com  Sun Nov 12 21:55:57 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 12 Nov 2006 21:55:57 +0100
Subject: [Python-Dev] ready-made timezones for the datetime module
In-Reply-To: <455772A7.6090800@v.loewis.de>
References: <ej7mn8$vrq$1@sea.gmane.org> <455772A7.6090800@v.loewis.de>
Message-ID: <ej81ot$1hf$1@sea.gmane.org>

Martin v. L?wis wrote:

>> I guess I should remember, but what's the rationale for not including 
>> even a single concrete "tzinfo" implementation in the standard library?
>>
>> not even a UTC class?
>>
>> or am I missing something?
> 
> If you are asking for a time-zone database

I was more thinking of basic stuff like the UTC, FixedOffset and 
LocalTimezone classes from the library reference:

     http://docs.python.org/lib/datetime-tzinfo.html

I just wrote a small RSS generator; it took more more time to sort out 
how to get strftime("%z") to print something meaningful than it took to 
write the rest of the code.

would anyone mind if I added the above classes to the datetime module ?

</F>


From barry at python.org  Sun Nov 12 22:16:23 2006
From: barry at python.org (Barry Warsaw)
Date: Sun, 12 Nov 2006 16:16:23 -0500
Subject: [Python-Dev] ready-made timezones for the datetime module
In-Reply-To: <ej81ot$1hf$1@sea.gmane.org>
References: <ej7mn8$vrq$1@sea.gmane.org> <455772A7.6090800@v.loewis.de>
	<ej81ot$1hf$1@sea.gmane.org>
Message-ID: <57828D04-F0DD-497A-AE11-BB7BC5FD675F@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 12, 2006, at 3:55 PM, Fredrik Lundh wrote:

> would anyone mind if I added the above classes to the datetime  
> module ?

+1.  I mean, we have an example of UTC in the docs, so, er, why not  
include it in the stdlib?!

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRVePLHEjvBPtnXfVAQIyGAQAi18TdI55P1vDp7sTuHS7eQMZmXMAr4+M
8i2RpWZrxtgi4c21J/qiwEIoY3KdANiUyzb8PbScf8LuFzZZTiDPsuMuTDC8IhBR
w6bvU/AOpsmWpkuSKyjPaVdgZlOQ8IsHOJUQtYAVDsfMCh4D0Y65jMHENi1gYzud
JJky5a6DifM=
=BxZL
-----END PGP SIGNATURE-----

From rasky at develer.com  Mon Nov 13 00:09:40 2006
From: rasky at develer.com (Giovanni Bajo)
Date: Mon, 13 Nov 2006 00:09:40 +0100
Subject: [Python-Dev] Summer of Code: zipfile?
Message-ID: <099a01c706af$a21f1d20$ce09f01b@bagio>

Hello,

wasn't there a project about the zipfile module in the Summer of Code? How did
it go?

Giovanni Bajo


From guido at python.org  Mon Nov 13 02:23:57 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 12 Nov 2006 17:23:57 -0800
Subject: [Python-Dev] ready-made timezones for the datetime module
In-Reply-To: <ej7mn8$vrq$1@sea.gmane.org>
References: <ej7mn8$vrq$1@sea.gmane.org>
Message-ID: <ca471dc20611121723x1726b08fi1aff067a277875d9@mail.gmail.com>

IMO it was an oversight. Or we were all exhausted. I keep copying
those three classes from the docs, which is silly. :-)

On 11/12/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> I guess I should remember, but what's the rationale for not including
> even a single concrete "tzinfo" implementation in the standard library?
>
> not even a UTC class?
>
> or am I missing something?

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido at python.org  Mon Nov 13 02:25:32 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 12 Nov 2006 17:25:32 -0800
Subject: [Python-Dev] Passing floats to file.seek
In-Reply-To: <200611130223.11460.anthony@interlink.com.au>
References: <4556FF00.3070108@v.loewis.de> <ej6ve0$bp$1@sea.gmane.org>
	<200611130223.11460.anthony@interlink.com.au>
Message-ID: <ca471dc20611121725k53e0a306mcb43407b3710ba98@mail.gmail.com>

On 11/12/06, Anthony Baxter <anthony at interlink.com.au> wrote:
> On Sunday 12 November 2006 22:09, Fredrik Lundh wrote:
> > Martin v. L?wis wrote:
> > > Patch #1067760 deals with passing of float values to file.seek;
> > > the original version tries to fix the current implementation
> > > by converting floats to long long, rather than plain C long
> > > (thus supporting files larger than 2GiB).
>
> > > b) if not, should Python 2.6 just deprecate such usage,
> > >    or outright reject it?
> >
> > Python 2.5 silently accepts (and truncates) a float that's within range,
> > so a warning sounds like the right thing to do for 2.6.  note that read
>
> I agree that a warning seems best. If someone (for whatever reason) is
> flinging floats around where they actually meant to have ints, going straight
> to an error from silently truncating and accepting it seems a little bit
> harsh.

Right. There seem to be people who believe that 1e6 is an int.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From nnorwitz at gmail.com  Mon Nov 13 05:18:00 2006
From: nnorwitz at gmail.com (Neal Norwitz)
Date: Sun, 12 Nov 2006 20:18:00 -0800
Subject: [Python-Dev] Summer of Code: zipfile?
In-Reply-To: <099a01c706af$a21f1d20$ce09f01b@bagio>
References: <099a01c706af$a21f1d20$ce09f01b@bagio>
Message-ID: <ee2a432c0611122018j3455be7bgf89fa0735d64ab9e@mail.gmail.com>

You probably need to contact the authors for more info:

https://svn.sourceforge.net/svnroot/ziparchive/ziparchive/trunk/
http://wiki.python.org/moin/SummerOfCode

n
--
On 11/12/06, Giovanni Bajo <rasky at develer.com> wrote:
> Hello,
>
> wasn't there a project about the zipfile module in the Summer of Code? How did
> it go?
>
> Giovanni Bajo
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/nnorwitz%40gmail.com
>

From fredrik at pythonware.com  Mon Nov 13 08:48:46 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 13 Nov 2006 08:48:46 +0100
Subject: [Python-Dev] ready-made timezones for the datetime module
In-Reply-To: <ca471dc20611121723x1726b08fi1aff067a277875d9@mail.gmail.com>
References: <ej7mn8$vrq$1@sea.gmane.org>
	<ca471dc20611121723x1726b08fi1aff067a277875d9@mail.gmail.com>
Message-ID: <ej980r$q46$1@sea.gmane.org>

Guido van Rossum wrote:

> IMO it was an oversight. Or we were all exhausted. I keep copying
> those three classes from the docs, which is silly. :-)

I'll whip up a patch.  would the "embedded python module" approach I'm 
using for _elementtree be okay, or should this go into a support library ?

</F>


From steve at holdenweb.com  Mon Nov 13 11:08:17 2006
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 13 Nov 2006 04:08:17 -0600
Subject: [Python-Dev] Passing floats to file.seek
In-Reply-To: <ca471dc20611121725k53e0a306mcb43407b3710ba98@mail.gmail.com>
References: <4556FF00.3070108@v.loewis.de>
	<ej6ve0$bp$1@sea.gmane.org>	<200611130223.11460.anthony@interlink.com.au>
	<ca471dc20611121725k53e0a306mcb43407b3710ba98@mail.gmail.com>
Message-ID: <ej9g48$j43$1@sea.gmane.org>

Guido van Rossum wrote:
> On 11/12/06, Anthony Baxter <anthony at interlink.com.au> wrote:
>> On Sunday 12 November 2006 22:09, Fredrik Lundh wrote:
>>> Martin v. L?wis wrote:
>>>> Patch #1067760 deals with passing of float values to file.seek;
>>>> the original version tries to fix the current implementation
>>>> by converting floats to long long, rather than plain C long
>>>> (thus supporting files larger than 2GiB).
>>>> b) if not, should Python 2.6 just deprecate such usage,
>>>>    or outright reject it?
>>> Python 2.5 silently accepts (and truncates) a float that's within range,
>>> so a warning sounds like the right thing to do for 2.6.  note that read
>> I agree that a warning seems best. If someone (for whatever reason) is
>> flinging floats around where they actually meant to have ints, going straight
>> to an error from silently truncating and accepting it seems a little bit
>> harsh.
> 
> Right. There seem to be people who believe that 1e6 is an int.
> 
In which case an immediate transition to error status would seem to 
offer a way of providing an effective education. Deprecation may well be 
the best way to go for customer-friendliness, but anyone who believes 
1e6 is an int should be hit with a stick.

Next thing you know some damned fool is going to suggest that 1e6 gets 
parsed into a long integer.

There, I feel better now.

thank-you-for-listening-ly y'rs  - steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden


From murman at gmail.com  Mon Nov 13 15:23:43 2006
From: murman at gmail.com (Michael Urman)
Date: Mon, 13 Nov 2006 08:23:43 -0600
Subject: [Python-Dev] Passing floats to file.seek
In-Reply-To: <ej9g48$j43$1@sea.gmane.org>
References: <4556FF00.3070108@v.loewis.de> <ej6ve0$bp$1@sea.gmane.org>
	<200611130223.11460.anthony@interlink.com.au>
	<ca471dc20611121725k53e0a306mcb43407b3710ba98@mail.gmail.com>
	<ej9g48$j43$1@sea.gmane.org>
Message-ID: <dcbbbb410611130623r6f9a6f55j979f4894c96633f1@mail.gmail.com>

On 11/13/06, Steve Holden <steve at holdenweb.com> wrote:
> In which case an immediate transition to error status would seem to
> offer a way of providing an effective education. Deprecation may well be
> the best way to go for customer-friendliness, but anyone who believes
> 1e6 is an int should be hit with a stick.

Right, but what about those people who just didn't examine it? I
consider myself a pretty good programmer, and was surprised by Guido's
remark. A little quick self-education later, I understood.

Still I find the implication that anyone using 1e6 for an integer
should be (have all their users) beaten absurd in the context of
backwards compatibility. Especially when they were using one of the
less apparent floats in a place that accepted floats. Perhaps it would
be a fine change for py3k.

> Next thing you know some damned fool is going to suggest that 1e6 gets
> parsed into a long integer.

I can guess why it isn't, but it seems more a matter of ease than a
matter of doing what's right. I had expected it to be an int because I
thought of 1e6 as a shorthand for (1 * 10 ** 6), which is an int. 1e-6
would be (1 * 10 ** -6) which is a float. 1.0e6 would be (1.0 * 10 **
6) which would also be a float. Clearly instead the e wins out as the
format specifier.

I'm not going to argue for it to be turned into an int, or even
suggest it, after all compatibility with obscure realities of C is
important. I'm just going to say that it makes more sense to me than
your reaction indicates.
-- 
Michael Urman  http://www.tortall.net/mu/blog

From skip at pobox.com  Mon Nov 13 15:49:26 2006
From: skip at pobox.com (skip at pobox.com)
Date: Mon, 13 Nov 2006 08:49:26 -0600
Subject: [Python-Dev] Passing floats to file.seek
In-Reply-To: <ej9g48$j43$1@sea.gmane.org>
References: <4556FF00.3070108@v.loewis.de> <ej6ve0$bp$1@sea.gmane.org>
	<200611130223.11460.anthony@interlink.com.au>
	<ca471dc20611121725k53e0a306mcb43407b3710ba98@mail.gmail.com>
	<ej9g48$j43$1@sea.gmane.org>
Message-ID: <17752.34294.638353.574267@montanaro.dyndns.org>


    >> Right. There seem to be people who believe that 1e6 is an int.
    ...
    Steve> Next thing you know some damned fool is going to suggest that 1e6
    Steve> gets parsed into a long integer.

Maybe in Py3k a decimal point should be required in floats using exponential
notation - 1.e6 or 1.0e6 - with suitable deprecation warnings in 2.6+ about
1e6.

Skip

From steve at holdenweb.com  Mon Nov 13 18:16:11 2006
From: steve at holdenweb.com (Steve Holden)
Date: Mon, 13 Nov 2006 11:16:11 -0600
Subject: [Python-Dev] Passing floats to file.seek
In-Reply-To: <17752.34294.638353.574267@montanaro.dyndns.org>
References: <4556FF00.3070108@v.loewis.de> <ej6ve0$bp$1@sea.gmane.org>
	<200611130223.11460.anthony@interlink.com.au>
	<ca471dc20611121725k53e0a306mcb43407b3710ba98@mail.gmail.com>
	<ej9g48$j43$1@sea.gmane.org>
	<17752.34294.638353.574267@montanaro.dyndns.org>
Message-ID: <4558A85B.8020304@holdenweb.com>

skip at pobox.com wrote:
>     >> Right. There seem to be people who believe that 1e6 is an int.
>     ...
>     Steve> Next thing you know some damned fool is going to suggest that 1e6
>     Steve> gets parsed into a long integer.
> 
> Maybe in Py3k a decimal point should be required in floats using exponential
> notation - 1.e6 or 1.0e6 - with suitable deprecation warnings in 2.6+ about
> 1e6.
> 
My remarks weren't entirely tongue in cheek. Once you have long integers 
seamlessly integrated there is a case to be made that if the mantissa is 
integral then the literal should have an integral value.

Then, of course, we'll get people complaining about the length of time 
it takes to compute expressions containing huge integers.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden

From guido at python.org  Mon Nov 13 18:31:51 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 13 Nov 2006 09:31:51 -0800
Subject: [Python-Dev] ready-made timezones for the datetime module
In-Reply-To: <ej980r$q46$1@sea.gmane.org>
References: <ej7mn8$vrq$1@sea.gmane.org>
	<ca471dc20611121723x1726b08fi1aff067a277875d9@mail.gmail.com>
	<ej980r$q46$1@sea.gmane.org>
Message-ID: <ca471dc20611130931t37de4c27pa6c1b780d797f6e0@mail.gmail.com>

On 11/12/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Guido van Rossum wrote:
>
> > IMO it was an oversight. Or we were all exhausted. I keep copying
> > those three classes from the docs, which is silly. :-)
>
> I'll whip up a patch.  would the "embedded python module" approach I'm
> using for _elementtree be okay, or should this go into a support library ?

I'll leave that to the 2.6 management; I don't know what you're
talking about and would rather keep it that way. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fredrik at pythonware.com  Tue Nov 14 22:51:23 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Tue, 14 Nov 2006 22:51:23 +0100
Subject: [Python-Dev] PyFAQ: help wanted with thread article
Message-ID: <ejddoq$b8k$1@sea.gmane.org>

(reposted from c.l.py)

the following FAQ item talks about using sleep to make sure that threads 
run properly:

http://effbot.org/pyfaq/none-of-my-threads-seem-to-run-why.htm

I suspect it was originally written for the "thread" module, since as
far as I know, the "threading" module takes care of the issues described
here all by itself.

so, should this item be removed?

or can anyone suggest a rewrite that's more relevant for "threading"
users?

</F>


From amk at amk.ca  Wed Nov 15 16:08:31 2006
From: amk at amk.ca (A.M. Kuchling)
Date: Wed, 15 Nov 2006 10:08:31 -0500
Subject: [Python-Dev] Arlington sprint this Saturday
Message-ID: <20061115150831.GA6153@rogue.amk.ca>

The monthly Arlington VA sprint is this Saturday, November 18 2006, 9
AM - 6 PM.  Please see http://wiki.python.org/moin/ArlingtonSprint
for directions.

--amk


From martin at v.loewis.de  Wed Nov 15 22:20:12 2006
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 15 Nov 2006 22:20:12 +0100
Subject: [Python-Dev] 2.5 portability problems
Message-ID: <455B848C.9070909@v.loewis.de>

I'd like to share an observation on portability of extension
modules to Python 2.5: python-ldap would crash on Solaris, see

http://groups.google.com/group/comp.lang.python/msg/a678a969c90f21ab?dmode=source&hl=en

It turns out that this was caused by a mismatch in malloc
"families" (PyMem_Del vs. PyObject_Del):

http://sourceforge.net/tracker/index.php?func=detail&aid=1575329&group_id=2072&atid=102072

So if Python 2.5 crashes in malloc/free, it's probably a good
guess that some extension module failed use correct APIs.

There is probably not much we can do about this: it's already
mentioned in "Porting to 2.5" of whatsnew25. It would be good
if people were aware of this issue (and the other changes to
the C API); thus I hope that this message/thread makes it to
the python-dev summary :-)

Regards,
Martin


From g.brandl at gmx.net  Wed Nov 15 23:15:07 2006
From: g.brandl at gmx.net (Georg Brandl)
Date: Wed, 15 Nov 2006 23:15:07 +0100
Subject: [Python-Dev] Results of the SOC projects
Message-ID: <ejg3hb$uls$1@sea.gmane.org>

Hi,

this might seem a bit late, and perhaps I was just blind,
but I miss something like a summary how the Python
summer of code projects went, and what the status of the ones
that were meant to improve the standard library, e.g. the
C decimal implementation, is.

cheers,
Georg


From nilton.volpato at gmail.com  Thu Nov 16 20:37:10 2006
From: nilton.volpato at gmail.com (Nilton Volpato)
Date: Thu, 16 Nov 2006 17:37:10 -0200
Subject: [Python-Dev] Summer of Code: zipfile?
In-Reply-To: <099a01c706af$a21f1d20$ce09f01b@bagio>
References: <099a01c706af$a21f1d20$ce09f01b@bagio>
Message-ID: <27fef5640611161137y29ee8eb5g6cfa42c80195b1c@mail.gmail.com>

Hi Giovanni,

I'm the author of the new zipfile module, which has come to be named
ziparchive. The SoC project was mentored by Ilya Etingof. It's
available through sourceforge page [1,2], were you can download a
package for it, and also through svn [3].

The current implementation is working nicely, and includes the
initially proposed features, which includes: file-like access to zip
members; support for BZIP2 compression; support for member file
removal; and support for encryption.

However, I'm not fully satisfied with the current API niceness (and
some of its limitations), and I'm working on a somewhat new design,
which will start within the next version. So, it would be very nice to
get suggestions, ideas and criticism about the current version so that
the next one can be better still.

So, I encourage whoever is interested to download and try it. There
are some examples in the code and in the project home page [2]. And,
please, send some feedback, which will help make this the ultimate zip
library for python. :-)

[1] http://sourceforge.net/projects/ziparchive
[2] http://ziparchive.sourceforge.net/
[3] https://svn.sourceforge.net/svnroot/ziparchive/ziparchive/

Cheers,
-- Nilton

On 11/12/06, Giovanni Bajo <rasky at develer.com> wrote:
> Hello,
>
> wasn't there a project about the zipfile module in the Summer of Code? How did
> it go?
>
> Giovanni Bajo

From brett at python.org  Thu Nov 16 20:41:16 2006
From: brett at python.org (Brett Cannon)
Date: Thu, 16 Nov 2006 11:41:16 -0800
Subject: [Python-Dev] Results of the SOC projects
In-Reply-To: <ejg3hb$uls$1@sea.gmane.org>
References: <ejg3hb$uls$1@sea.gmane.org>
Message-ID: <bbaeab100611161141p16875a0cw2b72a9290a24547@mail.gmail.com>

On 11/15/06, Georg Brandl <g.brandl at gmx.net> wrote:
>
> Hi,
>
> this might seem a bit late, and perhaps I was just blind,
> but I miss something like a summary how the Python
> summer of code projects went, and what the status of the ones
> that were meant to improve the standard library, e.g. the
> C decimal implementation, is.


There was never a formal one to my knowledge.  Part of the problem is that
the PSF acted as a blanket organization this year so we just basically
helped dole out slots to various Python projects.  This meant it was not
under very centralized control and thus not easy to track.

Anyway, as for the python-dev projects, there is an email in another thread
about the zip work.  As for the adding of logging to the stdlib modules or
the decimal in C, we need the mentors to step forward and say something
about that.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061116/89a372cb/attachment.htm 

From fredrik at pythonware.com  Thu Nov 16 21:49:34 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 16 Nov 2006 21:49:34 +0100
Subject: [Python-Dev] 2.5 portability problems
In-Reply-To: <455B848C.9070909@v.loewis.de>
References: <455B848C.9070909@v.loewis.de>
Message-ID: <ejiisu$mkh$1@sea.gmane.org>

Martin v. L?wis wrote:

> I'd like to share an observation on portability of extension
> modules to Python 2.5: python-ldap would crash on Solaris, see
> 
> http://groups.google.com/group/comp.lang.python/msg/a678a969c90f21ab?dmode=source&hl=en
> 
> It turns out that this was caused by a mismatch in malloc
> "families" (PyMem_Del vs. PyObject_Del):

I was just hit *hard* by this issue (in an extension that worked 
perfectly well under all test cases, and all but one demo script,
which happened to be the only one that happened to do a certain
trivial operation more than 222 times), so I added a FAQ entry:

http://effbot.org/pyfaq/why-does-my-c-extension-suddenly-crash-under-2.5.htm

feel free to add symptoms or other observations for other platforms 
and/or extensions.

cheers /F


From turnbull at sk.tsukuba.ac.jp  Fri Nov 17 02:49:04 2006
From: turnbull at sk.tsukuba.ac.jp (Stephen J. Turnbull)
Date: Fri, 17 Nov 2006 10:49:04 +0900
Subject: [Python-Dev] Results of the SOC projects
In-Reply-To: <bbaeab100611161141p16875a0cw2b72a9290a24547@mail.gmail.com>
References: <ejg3hb$uls$1@sea.gmane.org>
	<bbaeab100611161141p16875a0cw2b72a9290a24547@mail.gmail.com>
Message-ID: <878xiaj10v.fsf@uwakimon.sk.tsukuba.ac.jp>

Brett Cannon writes:

 > There was never a formal one to my knowledge.  Part of the problem is that
 > the PSF acted as a blanket organization this year so we just basically
 > helped dole out slots to various Python projects.  This meant it was not
 > under very centralized control and thus not easy to track.

I don't think you need "centralization" or "control"; the Python
mentors are all public spirited and responsible folks, right?  It's
just that report-writing is kind of unrewarding work, especially if
you don't know what the report is supposed to be like (and haven't
even been asked for them!)

Why not have a wiki page for reports, and hand out a T-shirt or
something like that to *mentors* who file their reports?  Somebody at
the PSF should sit down, think about what the report really needs to
say from their point of view, and buy a pizza (as well as the
T-shirt!)  for somebody trusted to write a good but *minimal* report.
Then point to that: "Here's the quality of prose and citation you need
to aspire to, here's the minimum length and content you *must* include."

Report-writing of this kind is for the *mentors*: you want to know who
supervises well, and eventually do meta-mentoring.  Of course the
participants should be writing reports too, but this page should link
to those reports.  You'll get them; the mentor's T-shirt ("Somebody
participated in the Summer of Code and all I got is this lousy
T-shirt") is at stake!


From kbk at shore.net  Fri Nov 17 05:38:56 2006
From: kbk at shore.net (Kurt B. Kaiser)
Date: Thu, 16 Nov 2006 23:38:56 -0500 (EST)
Subject: [Python-Dev] Weekly Python Patch/Bug Summary
Message-ID: <200611170438.kAH4cu59022987@bayview.thirdcreek.com>

Patch / Bug Summary
___________________

Patches :  416 open (-14) /  3463 closed (+16) /  3879 total ( +2)
Bugs    :  930 open ( +8) /  6333 closed (+17) /  7263 total (+25)
RFE     :  244 open ( -1) /   244 closed ( +3) /   488 total ( +2)

New / Reopened Patches
______________________

tkSimpleDialog freezes when apply raises exception  (2006-11-11)
       http://python.org/sf/1594554  opened by  Hirokazu Yamamoto

Iterating closed StringIO.StringIO  (2005-11-18)
       http://python.org/sf/1359365  reopened by  doerwalter

Cross compiling patches for MINGW  (2006-11-16)
       http://python.org/sf/1597850  opened by  Han-Wen Nienhuys

Patches Closed
______________

`in` for classic object causes segfault  (2006-11-07)
       http://python.org/sf/1591996  closed by  loewis

askyesnocancel helper for tkMessageBox  (2005-11-08)
       http://python.org/sf/1351744  closed by  loewis

PyErr_CheckSignals returns -1 on error, not 1  (2006-11-07)
       http://python.org/sf/1592072  closed by  gbrandl

make pty.fork() allocate a controlling tty  (2003-11-08)
       http://python.org/sf/838546  closed by  loewis

Add missing elide argument to Text.search  (2006-11-07)
       http://python.org/sf/1592250  closed by  loewis

mailbox: use fsync() to ensure data is really on disk  (2006-06-29)
       http://python.org/sf/1514544  closed by  akuchling

mailbox (Maildir): avoid losing messages on name clash  (2006-06-29)
       http://python.org/sf/1514543  closed by  akuchling

Fix struct.pack on 64-bit archs (broken on 2.*)  (2004-10-02)
       http://python.org/sf/1038854  closed by  loewis

Cross building python for mingw32  (2003-11-13)
       http://python.org/sf/841454  closed by  loewis

httplib: allowing stream-type body part in requests  (2004-11-12)
       http://python.org/sf/1065257  closed by  loewis

support whence argument for GzipFile.seek (bug #1316069)  (2005-11-12)
       http://python.org/sf/1355023  closed by  loewis

fix for 1067728: Better handling of float arguments to seek  (2004-11-17)
       http://python.org/sf/1067760  closed by  loewis

ftplib transfer problem with certain servers  (2005-11-17)
       http://python.org/sf/1359217  closed by  loewis

bdist_rpm still can't handle dashes in versions  (2005-11-18)
       http://python.org/sf/1360200  closed by  loewis

Fix the vc8 solution files  (2006-08-19)
       http://python.org/sf/1542946  closed by  krisvale

Practical ctypes example  (2006-09-15)
       http://python.org/sf/1559219  closed by  theller

New / Reopened Bugs
___________________

Unfortunate naming of variable in heapq example  (2006-11-08)
CLOSED http://python.org/sf/1592533  opened by  Martin Thorsen Ranang

gettext has problems with .mo files that use non-ASCII chars  (2006-11-08)
CLOSED http://python.org/sf/1592627  opened by  Russell Phillips

replace groups doesn't work in this special case  (2006-11-06)
       http://python.org/sf/1591319  reopened by  tomek74

readline problem on ia64-unknown-linux-gnu  (2006-11-08)
       http://python.org/sf/1593035  opened by  Kate Minola

No IDLE in Windows  (2006-11-09)
CLOSED http://python.org/sf/1593384  opened by  A_V_I

No IDLE in Windows  (2006-11-09)
CLOSED http://python.org/sf/1593407  opened by  A_V_I

No IDLE in Windows  (2006-11-09)
CLOSED http://python.org/sf/1593442  opened by  A_V_I

site-packages isn't created before install_egg_info  (2006-09-28)
CLOSED http://python.org/sf/1566719  reopened by  loewis

Modules/unicodedata.c contains C++-style comment  (2006-11-09)
CLOSED http://python.org/sf/1593525  opened by  Mike Kent

No IDLE in Windows  (2006-11-09)
CLOSED http://python.org/sf/1593634  opened by  A_V_I

poor urllib error handling  (2006-11-09)
       http://python.org/sf/1593751  opened by  Guido van Rossum

small problem with description  (2006-11-09)
CLOSED http://python.org/sf/1593829  opened by  Atlas

Word should be changed on page 3.6.1  (2006-11-11)
CLOSED http://python.org/sf/1594742  opened by  jikanter

Make docu for dict.update more clear  (2006-11-11)
CLOSED http://python.org/sf/1594758  opened by  Christoph Zwerschke

make install fails, various modules do not work  (2006-11-11)
CLOSED http://python.org/sf/1594809  opened by  Evan

doctest simple usage recipe is misleading  (2006-11-12)
       http://python.org/sf/1594966  opened by  Ken Rimey

smtplib.SMTP.sendmail() does not provide transparency  (2006-11-12)
CLOSED http://python.org/sf/1595045  opened by  Avi Kivity

texinfo library documentation fails to build  (2006-11-12)
       http://python.org/sf/1595164  opened by  Mark Diekhans

User-agent header added by an opener is "frozen"  (2006-11-13)
       http://python.org/sf/1595365  opened by  Bj?rn Steinbrink

parser module bug for nested try...except statements  (2006-11-13)
CLOSED http://python.org/sf/1595594  opened by  Kay Schluehr

SocketServer allow_reuse_address checked in constructor  (2006-11-13)
       http://python.org/sf/1595742  opened by  Peter Parente

read() in windows stops on chr(26)  (2006-11-13)
       http://python.org/sf/1595822  opened by  reson5

KeyError at exit after 'import threading' in other thread  (2006-11-14)
       http://python.org/sf/1596321  opened by  Christian Walther

HTTP headers  (2006-11-15)
       http://python.org/sf/1597000  opened by  Hugo Leisink

Reading with bz2.BZ2File() returns one garbage character  (2006-11-15)
       http://python.org/sf/1597011  opened by  Clodoaldo Pinto Neto

Can't exclude words before capture group  (2006-11-15)
CLOSED http://python.org/sf/1597014  opened by  Cees Timmerman

sqlite timestamp converter bug (floating point)  (2006-11-15)
       http://python.org/sf/1597404  opened by  Michael Salib

"Report website bug" -> Forbidden :(  (2006-11-16)
CLOSED http://python.org/sf/1597570  opened by  Jens Diemer

Modules/readline.c fails to compile on AIX 4.2  (2006-11-16)
       http://python.org/sf/1597798  opened by  Mike Kent

atexit.register does not return the registered function.   (2006-11-16)
CLOSED http://python.org/sf/1597824  opened by  Pierre Rouleau

Python/ast.c:541: seq_for_testlist: Assertion fails  (2006-11-16)
CLOSED http://python.org/sf/1597930  opened by  Darrell Schiebel

Top-level exception handler writes to stdout unsafely  (2006-11-16)
       http://python.org/sf/1598083  opened by  Jp Calderone

Bugs Closed
___________

Unfortunate naming of variable in heapq example  (2006-11-08)
       http://python.org/sf/1592533  closed by  gbrandl

gettext has problems with .mo files that use non-ASCII chars  (2006-11-08)
       http://python.org/sf/1592627  closed by  avantman42

problem building python in vs8express  (2006-11-06)
       http://python.org/sf/1591122  closed by  loewis

No IDLE in Windows  (2006-11-09)
       http://python.org/sf/1593384  closed by  loewis

No IDLE in Windows  (2006-11-09)
       http://python.org/sf/1593407  deleted by  akuchling

mailbox.Maildir.get_folder() loses factory information  (2006-10-03)
       http://python.org/sf/1569790  closed by  akuchling

No IDLE in Windows  (2006-11-09)
       http://python.org/sf/1593442  deleted by  gbrandl

site-packages isn't created before install_egg_info  (2006-09-28)
       http://python.org/sf/1566719  closed by  pje

Modules/unicodedata.c contains C++-style comment  (2006-11-09)
       http://python.org/sf/1593525  closed by  doerwalter

No IDLE in Windows  (2006-11-09)
       http://python.org/sf/1593634  deleted by  gbrandl

small problem with description  (2006-11-09)
       http://python.org/sf/1593829  deleted by  bauersj

Word should be changed on page 3.6.1  (2006-11-11)
       http://python.org/sf/1594742  closed by  gbrandl

Make docu for dict.update more clear  (2006-11-11)
       http://python.org/sf/1594758  closed by  gbrandl

make install fails, various modules do not work  (2006-11-11)
       http://python.org/sf/1594809  closed by  loewis

gzip.GzipFile.seek missing second argument  (2005-10-07)
       http://python.org/sf/1316069  closed by  loewis

smtplib.SMTP.sendmail() does not provide transparency  (2006-11-12)
       http://python.org/sf/1595045  deleted by  avik

parser module bug for nested try...except statements  (2006-11-13)
       http://python.org/sf/1595594  closed by  gbrandl

Can't exclude words before capture group  (2006-11-15)
       http://python.org/sf/1597014  closed by  gbrandl

"Report website bug" -> Forbidden :(  (2006-11-16)
       http://python.org/sf/1597570  closed by  gbrandl

atexit.register does not return the registered function.   (2006-11-16)
       http://python.org/sf/1597824  closed by  gbrandl

quoted printable parse the sequence '= ' incorrectly  (2006-10-31)
       http://python.org/sf/1588217  closed by  gbrandl

Python/ast.c:541: seq_for_testlist: Assertion fails  (2006-11-16)
       http://python.org/sf/1597930  closed by  gbrandl

New / Reopened RFE
__________________

"".translate() docs should mention string.maketrans()  (2006-11-08)
       http://python.org/sf/1592899  opened by  Ori Avtalion

base64 doc Python 2.3 <-> 2.4  (2006-11-16)
CLOSED http://python.org/sf/1597576  opened by  Jens Diemer

RFE Closed
__________

wsgi.org link in wsgiref  (2006-08-18)
       http://python.org/sf/1542920  closed by  akuchling

Move gmtime function from calendar to time module  (2003-03-05)
       http://python.org/sf/697985  closed by  akuchling

base64 doc Python 2.3 <-> 2.4  (2006-11-16)
       http://python.org/sf/1597576  closed by  gbrandl


From python-dev at zesty.ca  Sat Nov 18 18:28:23 2006
From: python-dev at zesty.ca (Ka-Ping Yee)
Date: Sat, 18 Nov 2006 11:28:23 -0600 (CST)
Subject: [Python-Dev] Python in first-year MIT core curriculum
Message-ID: <Pine.LNX.4.58.0611181125330.5097@server1.LFW.org>

Wow.  Did you catch this news?

http://www-tech.mit.edu/V125/N65/coursevi.html

    The first four weeks of C1 will be a lot like the first
    four weeks of 6.001, Abelson said. The difference is
    that programming will be done in Python and not Scheme.


-- ?!ng


From fredrik at pythonware.com  Sat Nov 18 19:05:05 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sat, 18 Nov 2006 19:05:05 +0100
Subject: [Python-Dev] Python in first-year MIT core curriculum
In-Reply-To: <Pine.LNX.4.58.0611181125330.5097@server1.LFW.org>
References: <Pine.LNX.4.58.0611181125330.5097@server1.LFW.org>
Message-ID: <ejni0h$ot7$1@sea.gmane.org>

Ka-Ping Yee wrote:

> Wow.  Did you catch this news?
> 
> http://www-tech.mit.edu/V125/N65/coursevi.html
> 
>     The first four weeks of C1 will be a lot like the first
>     four weeks of 6.001, Abelson said. The difference is
>     that programming will be done in Python and not Scheme.

"This story was published on Wednesday, February 1, 2006." ;-)

</F>


From brett at python.org  Sat Nov 18 22:02:16 2006
From: brett at python.org (Brett Cannon)
Date: Sat, 18 Nov 2006 13:02:16 -0800
Subject: [Python-Dev] discussion of schema for new issue tracker starting
Message-ID: <bbaeab100611181302u73976bacm862a0f727eef05cc@mail.gmail.com>

Discussion of what we want in terms of the schema for the new issue tracker
has begun.  If you wish to give feedback on what you would like each issue
to have in terms of data then please file an issue in the meta tracker at
http://psf.upfronthosting.co.za/roundup/meta/ .  You can see the current
test tracker at http://psf.upfronthosting.co.za/roundup/tracker/ .  And the
tracker-discuss mailing list is at
http://mail.python.org/mailman/listinfo/tracker-discuss (although you can
bypass the list and use the meta tracker to your ideas relating to the
schema).

If you do participate through the meta tracker please sign up for an account
so that it is not anonymous.  I really hope that Anthony and Neal can
participate so that we can make sure the tracker does what they need to make
their lives easier during a release.  And obviously everyone who still works
with bugs and patches should participate as well.  We can change the schema
even after we launch to the new tracker, but it would be nice to minimize
the amount of feature churn once the tracker is up and going.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061118/95531b46/attachment.html 

From martin at v.loewis.de  Sun Nov 19 11:58:37 2006
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 19 Nov 2006 11:58:37 +0100
Subject: [Python-Dev] Passing actual read size to urllib reporthook
Message-ID: <456038DD.4040304@v.loewis.de>

Patch #849407 proposes to change the meaning of the
urllib reporthook so that it takes the amount of the
data read instead of the block size as its second
argument.

While this is a behavior change (and even for
explicitly-documented behavior), I still propose
to apply the change:
- in many cases, the number of bytes read will
  equal to the block size, so no change should
  occur
- the signature (number of parameters) does not
  change, so applications shouldn't crash because
  of that change
- applications that do use the parameter to
  estimate total download time now get a better
  chance to estimate since they learn about
  short reads.

What do you think?

Regards,
Martin

From phd at phd.pp.ru  Mon Nov 20 09:44:57 2006
From: phd at phd.pp.ru (Oleg Broytmann)
Date: Mon, 20 Nov 2006 11:44:57 +0300
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <456038DD.4040304@v.loewis.de>
References: <456038DD.4040304@v.loewis.de>
Message-ID: <20061120084457.GF32570@phd.pp.ru>

On Sun, Nov 19, 2006 at 11:58:37AM +0100, "Martin v. L?wis" wrote:
> - the signature (number of parameters) does not
>   change, so applications shouldn't crash because
>   of that change

   I am slightly worried about the change in semantics.

> - applications that do use the parameter to
>   estimate total download time now get a better
>   chance to estimate since they learn about
>   short reads.

   +1

Oleg.
-- 
     Oleg Broytmann            http://phd.pp.ru/            phd at phd.pp.ru
           Programmers don't die, they just GOSUB without RETURN.

From fredrik at pythonware.com  Mon Nov 20 13:20:01 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 20 Nov 2006 13:20:01 +0100
Subject: [Python-Dev] Passing actual read size to urllib reporthook
References: <456038DD.4040304@v.loewis.de>
Message-ID: <ejs6hi$f9o$1@sea.gmane.org>

Martin v. L?wis wrote:

> While this is a behavior change (and even for
> explicitly-documented behavior), I still propose
> to apply the change:
> - in many cases, the number of bytes read will
>  equal to the block size, so no change should
>  occur
> - the signature (number of parameters) does not
>  change, so applications shouldn't crash because
>  of that change
> - applications that do use the parameter to
>  estimate total download time now get a better
>  chance to estimate since they learn about
>  short reads.

haven't used the reporthook, but my reading of the documentation would have led me
to believe that I should do count*blocksize to determine how much data I've gotten this
far.  changing the blocksize without setting the count to zero would break such code.

</F> 


From jimjjewett at gmail.com  Mon Nov 20 17:57:31 2006
From: jimjjewett at gmail.com (Jim Jewett)
Date: Mon, 20 Nov 2006 11:57:31 -0500
Subject: [Python-Dev] Results of the SOC projects
Message-ID: <fb6fbf560611200857v6e5ae357t31592ad3d0dfec28@mail.gmail.com>

Brett:

> As for the adding of logging to the stdlib modules ... we need the mentors
> to step forward and say something about that.

The logging additions are not ready for stdlib inclusion at this time.

Some modules are closer than others, but whether it makes sense to add
them piecemeal is a different question.

-jJ

From fredrik at pythonware.com  Mon Nov 20 20:34:10 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Mon, 20 Nov 2006 20:34:10 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
Message-ID: <ejsvvj$hdc$1@sea.gmane.org>

the FAQ contains a list of "atomic" operation, and someone recently 
asked whether += belongs to this group.  can anyone who knows the answer 
perhaps add a comment to:

http://effbot.org/pyfaq/what-kinds-of-global-value-mutation-are-thread-safe.htm

?  (other comments on that page are of course also welcome)

</F>


From martin at v.loewis.de  Mon Nov 20 23:45:05 2006
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Nov 2006 23:45:05 +0100
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <ejs6hi$f9o$1@sea.gmane.org>
References: <456038DD.4040304@v.loewis.de> <ejs6hi$f9o$1@sea.gmane.org>
Message-ID: <45622FF1.4030202@v.loewis.de>

Fredrik Lundh schrieb:
> haven't used the reporthook, but my reading of the documentation would have led me
> to believe that I should do count*blocksize to determine how much data I've gotten this
> far.  changing the blocksize without setting the count to zero would break such code.

Right - such code would break. I believe the code would also break when
the count is set to zero; I can't see how this would help.

The question is whether this breakage is a strong enough reason not to
change the code.

Regards,
Martin

From martin at v.loewis.de  Mon Nov 20 23:55:42 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Mon, 20 Nov 2006 23:55:42 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <ejsvvj$hdc$1@sea.gmane.org>
References: <ejsvvj$hdc$1@sea.gmane.org>
Message-ID: <4562326E.904@v.loewis.de>

Fredrik Lundh schrieb:
> the FAQ contains a list of "atomic" operation, and someone recently 
> asked whether += belongs to this group.

In general, += isn't atomic: it may invoke __add__ or __iadd__ on the
left-hand side, or __radd__ on the right-hand side.

>From your list, I agree with Josiah Carlson's observation that the
examples you give involve separate name lookups (e.g. L.append(x)
loads L, then fetches L.append, then loads x, then calls apped,
each in a single opcode); the actual operation is atomic.

If you only look at the actual operation, the these aren't atomic:

x.field = y # may invoke __setattr__, may also be a property
D[x] = y    # may invoke x.__hash__, and x.__eq__

I'm uncertain whether D1.update(D2) will invoke callbacks (it
probably will).

Regards,
Martin

From scott+python-dev at scottdial.com  Tue Nov 21 00:23:59 2006
From: scott+python-dev at scottdial.com (Scott Dial)
Date: Mon, 20 Nov 2006 18:23:59 -0500
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <ejs6hi$f9o$1@sea.gmane.org>
References: <456038DD.4040304@v.loewis.de> <ejs6hi$f9o$1@sea.gmane.org>
Message-ID: <4562390F.8090703@scottdial.com>

Fredrik Lundh wrote:
> haven't used the reporthook, but my reading of the documentation would have led me
> to believe that I should do count*blocksize to determine how much data I've gotten this
> far.  changing the blocksize without setting the count to zero would break such code.
> 
> </F> 
> 

I'm not sure where the error in your reading happened, but I read the 
docs and got the same thing out of it except that there is no problem 
with Martin's change. This API doesn't seem to make much sense anyways 
because who is going to be interested in the count? Fixing the count to 
one always and setting blocksize to the actual amount of data makes the 
most sense in recovering this API.

The only potential problem is if there is a non-null answer to "who is 
going to be interested in the count?"

-- 
Scott Dial
scott at scottdial.com
scodial at cs.indiana.edu

From julvar at tamu.edu  Mon Nov 13 06:27:49 2006
From: julvar at tamu.edu (Julian)
Date: Sun, 12 Nov 2006 23:27:49 -0600
Subject: [Python-Dev] Suggestion/ feature request
Message-ID: <014701c706e4$75612780$24b75ba5@aero.ad.tamu.edu>

Hello,

I am using python with swig and I get a lot of macro redefinition warnings
like so:
warning C4005: '_CRT_SECURE_NO_DEPRECATE' : macro redefinition

In the file - pyconfig.h - rather than the following lines, I was wondering
if it would be more reasonable to use #ifdef statements as shown in the
bottom of the email...

#define _CRT_SECURE_NO_DEPRECATE 1
#define _CRT_NONSTDC_NO_DEPRECATE 1


#if !defined(_CRT_SECURE_NO_DEPRECATE)
# define _CRT_SECURE_NO_DEPRECATE
#endif

#if !defined(_CRT_NONSTDC_NO_DEPRECATE)
# define _CRT_NONSTDC_NO_DEPRECATE
#endif


Just a suggestion...

Thanks for reading!

Julian.


From matt.kern at undue.org  Fri Nov 17 13:15:22 2006
From: matt.kern at undue.org (Matt Kern)
Date: Fri, 17 Nov 2006 12:15:22 +0000
Subject: [Python-Dev] POSIX Capabilities
Message-ID: <20061117121522.GA13677@pling.qwghlm.org>

I was looking around for an interface to POSIX capabilities from Python
under Linux.  I couldn't find anything that did the job, so I wrote the
attached PosixCapabilities module.   It has a number of shortcomings:

  * it is written using ctypes to interface directly to libcap;
  * it assumes the sizes/types of various POSIX defined types;
  * it only gets/sets process capabilities;
  * it can test/set/clear capability flags.

Despite the downsides, I think it would be good to get the package out
there.  If anyone wishes to adopt it, update it, rewrite it and/or put
it into the distribution, then feel free.

Regards,
Matt

-- 
Matt Kern
http://www.undue.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PosixCapabilities.py
Type: text/x-python
Size: 7374 bytes
Desc: not available
Url : http://mail.python.org/pipermail/python-dev/attachments/20061117/4db99f3b/attachment-0001.py 

From kate01123 at gmail.com  Sat Nov 18 20:40:45 2006
From: kate01123 at gmail.com (Kate Minola)
Date: Sat, 18 Nov 2006 14:40:45 -0500
Subject: [Python-Dev] [1593035] Re: readline problem with python-2.5
Message-ID: <9c27041b0611181140y2ac41d89m2815225db3b42cd9@mail.gmail.com>

I have a fix to my bug report 1593035 regarding
python-2.5 not working with readline on ia64-Linux.
This bug was found while trying to port SAGE
(http://modular.math.washington.edu/sage/) to ia64-Linux.

The problem is caused by the line of Modules/readline.c
in flex_complete()

  return completion_matches(text, *on_completion);

In readline-5.2, completion_matches() is defined in
compat.c as

  char ** completion_matches(const char *,rl_compentry_func_t *);

But in Modules/readline.c completion_matches() by default is
assumed to return an int, and on_completion() is defined as
char *.

To fix the problem, both the function itself and the
second argument need to be cast to the correct types
in Modules/readline.c/flex_complete()

  return (char **) completion_matches(text, (rl_compentry_func_t *)*on_completio
n);

and completion_matches needs to be defined as an external
function.  I added the following else clause to the ifdef
at the top of Modules/readline.c/flex_complete()

#ifdef HAVE_RL_COMPLETION_MATCHES
#define completion_matches(x, y) \
        rl_completion_matches((x), ((rl_compentry_func_t *)(y)))
#else
extern char ** completion_matches(const char *,rl_compentry_func_t *);
#endif

Kate Minola
University of Maryland, College Park

From guido at python.org  Tue Nov 21 02:24:21 2006
From: guido at python.org (Guido van Rossum)
Date: Mon, 20 Nov 2006 17:24:21 -0800
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <45622FF1.4030202@v.loewis.de>
References: <456038DD.4040304@v.loewis.de> <ejs6hi$f9o$1@sea.gmane.org>
	<45622FF1.4030202@v.loewis.de>
Message-ID: <ca471dc20611201724m4bbb7afapf0196f4383fb042b@mail.gmail.com>

Is there any reason to assume the data size is ever less than the
block size except for the last data block? It's reading from a
pseudo-file tied to a socket, but Python files tend to have the
property that read(n) returns exactly n bytes unless at EOF.

BTW I left a longer comment at SF earlier.

On 11/20/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Fredrik Lundh schrieb:
> > haven't used the reporthook, but my reading of the documentation would have led me
> > to believe that I should do count*blocksize to determine how much data I've gotten this
> > far.  changing the blocksize without setting the count to zero would break such code.
>
> Right - such code would break. I believe the code would also break when
> the count is set to zero; I can't see how this would help.
>
> The question is whether this breakage is a strong enough reason not to
> change the code.
>
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From exarkun at divmod.com  Tue Nov 21 03:53:04 2006
From: exarkun at divmod.com (Jean-Paul Calderone)
Date: Mon, 20 Nov 2006 21:53:04 -0500
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <4562326E.904@v.loewis.de>
Message-ID: <20061121025304.20948.524767534.divmod.quotient.36726@ohm>


On Mon, 20 Nov 2006 23:55:42 +0100, "\"Martin v. L?wis\"" <martin at v.loewis.de> wrote:
>Fredrik Lundh schrieb:
>> the FAQ contains a list of "atomic" operation, and someone recently
>> asked whether += belongs to this group.
>
>In general, += isn't atomic: it may invoke __add__ or __iadd__ on the
>left-hand side, or __radd__ on the right-hand side.
>
>>From your list, I agree with Josiah Carlson's observation that the
>examples you give involve separate name lookups (e.g. L.append(x)
>loads L, then fetches L.append, then loads x, then calls apped,
>each in a single opcode); the actual operation is atomic.
>
>If you only look at the actual operation, the these aren't atomic:
>
>x.field = y # may invoke __setattr__, may also be a property
>D[x] = y    # may invoke x.__hash__, and x.__eq__
>
>I'm uncertain whether D1.update(D2) will invoke callbacks (it
>probably will).

Quite so:

    >>> class X:
    ...     def __del__(self):
    ...             print 'X.__del__'
    ... 
    >>> a = {1: X()}
    >>> b = {1: 2}
    >>> a.update(b)
    X.__del__
    >>> 

Jean-Paul

From steven.bethard at gmail.com  Tue Nov 21 05:05:22 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Mon, 20 Nov 2006 21:05:22 -0700
Subject: [Python-Dev] DRAFT: python-dev summary for 2006-10-01 to 2006-10-15
Message-ID: <d11dcfba0611202005l41542ea5p9ff75983a2c3ef10@mail.gmail.com>

Here's the summary for the first half of October.  As always, comments
and corrections are greatly appreciated.


=============
Announcements
=============

-----------------------------
QOTF: Quotes of the Fortnight
-----------------------------

Martin v. L?wis on a small change to Python that wouldn't affect many
applications:

    I'm pretty sure someone will notice, though; someone always notices.

Contributing thread:

- `Caching float(0.0)
<http://mail.python.org/pipermail/python-dev/2006-October/069175.html>`__

Steve Holden reminds us that patch submissions are dramatically
preferred to verbose thread discussions:

    This thread has disappeared down a rat-hole, never to re-emerge
with anything of significant benefit to users. C'mon, guys, implement
a patch or leave it alone :-)

Contributing thread:

- `Caching float(0.0)
<http://mail.python.org/pipermail/python-dev/2006-October/069190.html>`__


=========
Summaries
=========

--------------
Caching floats
--------------

Nick Craig-Wood discovered that he could save 7MB in his application
by adding the following simple code::

    if age == 0.0:
        age = 0.0

A large number of his calculations were producing the value 0.0, which
meant that many copies of 0.0 were being stored. Since all 0.0
literals refer to the same object, the code above was removing all the
duplicate copies of 0.0.

Skip Montanaro and played around a bit with floatobject.c, and found
that Python's test suite allocated a large number of small integral
floats (though only a couple hundred were generally allocated at the
same time). Kristj?n V. J?nsson played around with caching for float
values between -10.0 and 10.0 with the EVE server and got a 25%
savings in allocations.

There was some concern that for systems with both +0.0 and -0.0, the
cache might cause problems, since determining which zero you have
seemed difficult. However, James Y Knight showed how to do this fairly
easily in C with a double/uint64_t union. Eventually, people agreed
that it should be fine to just cache +0.0.

Kristj?n V. J?nsson and Josiah Carlson proposed patches, but nothing
was posted to SourceForge.

Contributing threads:

- `Caching float(0.0)
<http://mail.python.org/pipermail/python-dev/2006-September/069049.html>`__
- `Caching float(0.0)
<http://mail.python.org/pipermail/python-dev/2006-October/069092.html>`__

--------------------------------------------
Buffer overrun in repr() and Python releases
--------------------------------------------

The implications of PSF-2006-001_, a buffer overrun problem in repr(),
were considered for the various Python releases. The bug had been
fixed before Python 2.5 was released, and had been applied to the
Python 2.4 branch shortly before Python 2.4.4 was released. The
security advisory provided patches for both Python 2.3 and 2.4, but to
make sure that full source releases were available for all major
versions of Python still in use, it looked like there would be a
source-only 2.3.6 release (source-only because neither Mac nor Windows
builds were affected).

.. _PSF-2006-001: http://www.python.org/news/security/PSF-2006-001/

Contributing threads:

- `Security Advisory for unicode repr() bug?
<http://mail.python.org/pipermail/python-dev/2006-October/069247.html>`__
- `2.3.6 for the unicode buffer overrun
<http://mail.python.org/pipermail/python-dev/2006-October/069327.html>`__

---------------------------
Build system for python.org
---------------------------

Anthony, Barry Warsaw, Georg Brandl and others indicated that the
current website build system was making releases and other updates
more difficult than they should be. Most people didn't have enough
cycles to spare for this, but Michael Foord said he could help with a
transition to rest2web_ if that was desirable. Fredrik Lundh also
suggested a few options, including his older proposal to `use
Django`_. No definite plans were made though.

.. _rest2web: http://www.voidspace.org.uk/python/rest2web/
.. _use Django: http://effbot.org/zone/pydotorg-cache.htm

Contributing thread:

- `2.3.6 for the unicode buffer overrun
<http://mail.python.org/pipermail/python-dev/2006-October/069327.html>`__

--------------------
String concatenation
--------------------

Larry Hastings posted a `patch for string concatenation`_ that delays
the creation of a new string until someone asks for the string's
value. As a result, the following code would be about as fast as the
``''.join(strings)`` idiom::

    result = ''
    for s in strings:
        result += s

To achieve this, he had to change ``PyStringObject.ob_sval`` from a
``char[1]`` array, to a ``char *``. Reaction was mixed -- some people
really disliked using ``join()``, while others didn't see the need for
such a change.

.. _patch for string concatenation: http://bugs.python.org/1569040

Contributing thread:

- `PATCH submitted: Speed up + for string concatenation, now as fast
as "".join(x) idiom
<http://mail.python.org/pipermail/python-dev/2006-October/069224.html>`__

----------------------------
PEP 315: Enhanced While Loop
----------------------------

Hans Polak revived the discussion about `PEP 315`_, which proposes a
do-while loop for Python that would allow the current code::

    while True:
        <setup code>
        if not <condition>:
            break
        <loop body>

to be written instead as::

    do:
        <setup code>
    while <condition>:
        <loop body>

Hans was hoping to simplify the situation where there is no ``<loop
body>`` following the ``<condition>`` test and a number of syntax
suggestions were proposed to this end. In the end, Guido indicated
that none of the suggestions were acceptable, and Raymond Hettinger
offered to withdraw the PEP.

.. _PEP 315: http://www.python.org/dev/peps/pep-0315/

Contributing threads:

- `PEP 351 - do while
<http://mail.python.org/pipermail/python-dev/2006-September/069088.html>`__
- `PEP 351 - do while
<http://mail.python.org/pipermail/python-dev/2006-October/069100.html>`__
- `PEP 315 - do while
<http://mail.python.org/pipermail/python-dev/2006-October/069205.html>`__

------------------------------
PEP 355: path objects rejected
------------------------------

Luis P Caamano asked about the status of `PEP 355`_, which aimed to
introduce an object-oriented reorganization of Python's path-related
functions. Guido indicated that the current "amalgam of unrelated
functionality" was unacceptable and pronounced it dead. Nick Coghlan
elaborated the "amalgam" point, explaining that `PEP 355`_ lumped
together all the following:

- string manipulation operations
- abstract path manipulation operations
- read-only traversal of a concrete filesystem
- addition and removal of files/directories/links within a concrete filesystem

Jason Orendorff pointed out some other problems with the PEP:

- the motivation was weak
- the API had too many methods
- it didn't fix all the perceived problems with the existing APIs
- it would have introduced a Second Way To Do It without being clearly
better than the current way

There were some rumors of a new PEP based on Twisted's filepath_
module, but nothing concrete at the time of this summary.

.. _PEP 355: http://www.python.org/dev/peps/pep-0355/
.. _filepath: http://twistedmatrix.com/trac/browser/trunk/twisted/python/filepath.py

Contributing threads:

- `PEP 355 status
<http://mail.python.org/pipermail/python-dev/2006-September/069059.html>`__
- `PEP 355 status
<http://mail.python.org/pipermail/python-dev/2006-October/069093.html>`__

----------------------------------
Processes and threading module API
----------------------------------

Richard Oudkerk proposed a module that would make processes usable
with an API like that of the threading module. People seemed unsure as
to whether it would be better to have a threading-style or
XML-RPC-style API. A few other relevant modules were identified,
including PyXMLRPC_ and POSH_. No clear winner emerged.

.. _PyXMLRPC: http://sourceforge.net/projects/py-xmlrpc/
.. _POSH: http://poshmodule.sourceforge.net/

Contributing thread:

- `Cloning threading.py using proccesses
<http://mail.python.org/pipermail/python-dev/2006-October/069297.html>`__

------------------------------------
Python 3.0: registering methods in C
------------------------------------

After a brief exchange about which ``tp_flags`` implied which others,
there was some discussion on how to simplify ``tp_flags`` for Python
3000. Raymond Hettinger suggested that the NULL or non-NULL status of
a slot should be enough to indicate its presence. Martin v. L?wis
pointed out that this would require recompiling extension modules for
every release, since if a new slot is added, extension modules from
earlier releases wouldn't even *have* the slot. Fredrik Lundh
suggested a `dynamic registration method`_ instead, which would look
something like::

    static PyTypeObject *NoddyType;
    NoddyType = PyType_Setup("noddy.Noddy", sizeof(Noddy));
    PyType_Register(NoddyType, PY_TP_DEALLOC, Noddy_dealloc);
    PyType_Register(NoddyType, PY_TP_DOC, "Noddy objects");
    ...
    PyType_Register(NoddyType, PY_TP_NEW, Noddy_new);
    if (PyType_Ready(&NoddyType) < 0)
        return;

People thought this looked like a good idea, and Fredrik Lundh planned
to look into it seriously for Python 3000.

.. _dynamic registration method: http://effbot.org/zone/idea-register-type.htm

Contributing thread:

- `2.4.4: backport classobject.c HAVE_WEAKREFS?
<http://mail.python.org/pipermail/python-dev/2006-October/069235.html>`__

-----------------------------------------
Tracker Recommendations: JIRA and Roundup
-----------------------------------------

The PSF Infrastructure Committee announced their recommendations for
trackers to replace SourceForge. Both JIRA and Roundup were definite
improvements over SourceForge, though the Infrastructure Committee was
leaning towards JIRA since Atlassian had offered to host it for them.
Roundup was still under consideration if 6-10 admins could volunteer
to maintain the installation.

(More updates on this in the next summary.)

Contributing thread:

- `PSF Infrastructure Committee's recommendation for a new issue
tracker <http://mail.python.org/pipermail/python-dev/2006-October/069139.html>`__

-----------------------------
PEP 302: import hooks phase 2
-----------------------------

Brett Cannon announced that he'd be working on a C implementation of
phase 2 of `PEP 302`_. Phillip J. Eby pointed out that phase 2 could
not be implemented in a backwards-compatible way, and so the code
should be targeted at the p3yk branch. He also suggested that
rewriting the import mechanisms in Python was probably going to be
easier than trying to do it in C, particularly since some of the
pieces were already available in the pkgutil module. Neal Norwitz
strongly agreed, pointing out that string and list manipulation, which
is necessary in a variety of places in the import mechanisms, is much
easier in Python than in C. Brett promised a Python implementation as
part of his research work.

.. _PEP 302: http://www.python.org/dev/peps/pep-0302/

Contributing thread:

- `Created branch for PEP 302 phase 2 work (in C)
<http://mail.python.org/pipermail/python-dev/2006-October/069133.html>`__

------------------------------------------
Web crawlers and development documentation
------------------------------------------

Fredrik Lundh noticed that Google was sometimes finding the
`development documentation`_ instead of the `current release
documentation`_. A.M. Kuchling added a ``robots.txt`` to keep crawlers
out of the development area.

.. _development documentation: http://docs.python.org/dev/
.. _current release documentation: http://docs.python.org/

Contributing thread:

- `what's really new in python 2.5 ?
<http://mail.python.org/pipermail/python-dev/2006-October/069158.html>`__

-----------------------------------------
Buildbots, compile errors and batch files
-----------------------------------------

Tim Peters noticed that bsddb was getting compile errors on Windows
but the buildbots were not reporting anything. Because some additional
commands were added after the call to ``devenv.com`` in the
``build.bat`` script, the error status was not getting propagated
appropriately. After Tim and Martin v. L?wis figured out how to repair
this, the buildbots were again able to report compile errors.

Contributing thread:

- `2.4 vs Windows vs bsddb
<http://mail.python.org/pipermail/python-dev/2006-October/069283.html>`__

---------------------------------
Python 2.5 and Visual Studio 2005
---------------------------------

Kristj?n V. J?nsson showed that using Visual Studio 2005 instead of
Visual Studio 2003 gave a 7% gain in speed, and a 10% gain when
performance guided optimization (PGO) was enabled. While the
"official" compiler can't get changed at a point release, everyone
agreed that making the PCBuild8 directory work out of the box and
adding an appropriate buildslave was a good idea. Kristj?n promised to
look into setting up a buildslave.

Contributing thread:

- `Python 2.5 performance
<http://mail.python.org/pipermail/python-dev/2006-October/069338.html>`__

----------------------------------
Distributing debug build of Python
----------------------------------

David Abrahams asked if python.org would be willing to post links to
the ActiveState debug builds of Python to make it easier for
Boost.Python_ users to obtain a debug build. People seemed to think
that Boost.Python_ users should be able to create a debug build of
Python themselves if necessary.

.. _Boost.Python: http://www.boost.org/libs/python/doc/index.html

Contributing thread:

- `Plea to distribute debugging lib
<http://mail.python.org/pipermail/python-dev/2006-October/069325.html>`__

-----------------------------------------
Unmarshalling/Unpickling multiple objects
-----------------------------------------

Tim Lesher proposed adding a generator to marshal and pickle so that
instead of::

    while True:
        try:
            obj = marshal.load(fobj)    # or pickle.load(fobj)
        except EOFError:
            break
        ... do something with obj ...

you could write something like::

    for obj in marshal.loaditer(fobj):    # or pickle.loaditer(fobj)
        ... do something with obj ...

when you wanted to load multiple objects in sequence from the same
file. Both Perforce and Mailman store objects in a way that would
benefit from such a function, so it seemed like such an API might be
reasonable.  No patch had been submitted at the time of this summary.

Contributing thread:

- `Iterating over marshal/pickle
<http://mail.python.org/pipermail/python-dev/2006-October/069277.html>`__

-------------------------------
spawnvp and spawnvpe on Windows
-------------------------------

Alexey Borzenkov asked why spawnvp and spawnvpe weren't available in
Python on Windows even though they were implemented in the CRT. He got
the usual answer, that no one had submitted an appropriate patch, but
that such a patch would be a reasonable addition for Python 2.6.
Fredrik Lundh pointed out that the subprocess module was probably a
better choice than spawnvp and spawnvpe anyway.

Contributing thread:

- `Why spawnvp not implemented on Windows?
<http://mail.python.org/pipermail/python-dev/2006-October/069342.html>`__


==================
Previous Summaries
==================
- `difficulty of implementing phase 2 of PEP 302 in Python source
<http://mail.python.org/pipermail/python-dev/2006-October/069116.html>`__
- `Python Doc problems
<http://mail.python.org/pipermail/python-dev/2006-October/069252.html>`__
- `Signals, threads, blocking C functions
<http://mail.python.org/pipermail/python-dev/2006-October/069334.html>`__

===============
Skipped Threads
===============
- `Removing __del__
<http://mail.python.org/pipermail/python-dev/2006-September/068875.html>`__
- `Tix not included in 2.5 for Windows
<http://mail.python.org/pipermail/python-dev/2006-October/069095.html>`__
- `Weekly Python Patch/Bug Summary
<http://mail.python.org/pipermail/python-dev/2006-October/069098.html>`__
- `HAVE_UINTPTR_T test in configure.in
<http://mail.python.org/pipermail/python-dev/2006-October/069103.html>`__
- `OT: How many other people got this spam?
<http://mail.python.org/pipermail/python-dev/2006-October/069117.html>`__
- `2.4.4 fixes <http://mail.python.org/pipermail/python-dev/2006-October/069160.html>`__
- `2.4.4 fix: Socketmodule Ctl-C patch
<http://mail.python.org/pipermail/python-dev/2006-October/069189.html>`__
- `[Python-checkins] r51862 -
python/branches/release25-maint/Tools/msi/msi.py
<http://mail.python.org/pipermail/python-dev/2006-October/069201.html>`__
- `Fwd: [ python-Feature Requests-1567948 ] poplib.py list interface
<http://mail.python.org/pipermail/python-dev/2006-October/069223.html>`__
- `Can't check in on release25-maint branch
<http://mail.python.org/pipermail/python-dev/2006-October/069256.html>`__
- `if __debug__: except Exception, e: pdb.set_trace()
<http://mail.python.org/pipermail/python-dev/2006-October/069271.html>`__
- `2.5, 64 bit <http://mail.python.org/pipermail/python-dev/2006-October/069275.html>`__
- `BUG (urllib2) Authentication request header is broken on long
usernames and passwords
<http://mail.python.org/pipermail/python-dev/2006-October/069284.html>`__
- `[Python-3000] Sky pie: a "var" keyword
<http://mail.python.org/pipermail/python-dev/2006-October/069292.html>`__
- `Proprietary code in python?
<http://mail.python.org/pipermail/python-dev/2006-October/069294.html>`__
- `DRAFT: python-dev summary for 2006-08-16 to 2006-08-31
<http://mail.python.org/pipermail/python-dev/2006-October/069299.html>`__
- `BRANCH FREEZE, release24-maint for 2.4.4c1. 00:00UTC, 11 October
2006 <http://mail.python.org/pipermail/python-dev/2006-October/069302.html>`__
- `2.4 vs Windows vs bsddb [correction]
<http://mail.python.org/pipermail/python-dev/2006-October/069310.html>`__
- `RELEASED Python 2.4.4, release candidate 1
<http://mail.python.org/pipermail/python-dev/2006-October/069326.html>`__
- `ConfigParser: whitespace leading comment lines
<http://mail.python.org/pipermail/python-dev/2006-October/069337.html>`__
- `Exceptions and slicing
<http://mail.python.org/pipermail/python-dev/2006-October/069363.html>`__
- `Proposal: No more standard library additions
<http://mail.python.org/pipermail/python-dev/2006-October/069398.html>`__
- `[py3k] Re: Proposal: No more standard library additions
<http://mail.python.org/pipermail/python-dev/2006-October/069399.html>`__
- `Modulefinder
<http://mail.python.org/pipermail/python-dev/2006-October/069412.html>`__
- `VC6 support on release25-maint
<http://mail.python.org/pipermail/python-dev/2006-October/069426.html>`__
- `os.utime on directories: bug fix or new feature?
<http://mail.python.org/pipermail/python-dev/2006-October/069431.html>`__
- `Problem building module against Mac Python 2.4 and Python 2.5
<http://mail.python.org/pipermail/python-dev/2006-October/069434.html>`__

From aahz at pythoncraft.com  Tue Nov 21 05:12:32 2006
From: aahz at pythoncraft.com (Aahz)
Date: Mon, 20 Nov 2006 20:12:32 -0800
Subject: [Python-Dev] POSIX Capabilities
In-Reply-To: <20061117121522.GA13677@pling.qwghlm.org>
References: <20061117121522.GA13677@pling.qwghlm.org>
Message-ID: <20061121041232.GB25517@panix.com>

On Fri, Nov 17, 2006, Matt Kern wrote:
>
> I was looking around for an interface to POSIX capabilities from Python
> under Linux.  I couldn't find anything that did the job, so I wrote the
> attached PosixCapabilities module.   It has a number of shortcomings:

Please upload it to the Cheeseshop; optional is making an announcement
on c.l.py.announce.  python-dev really is not the right place.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

"In many ways, it's a dull language, borrowing solid old concepts from
many other languages & styles:  boring syntax, unsurprising semantics,
few automatic coercions, etc etc.  But that's one of the things I like
about it."  --Tim Peters on Python, 16 Sep 1993

From martin at v.loewis.de  Tue Nov 21 06:56:20 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 06:56:20 +0100
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <ca471dc20611201724m4bbb7afapf0196f4383fb042b@mail.gmail.com>
References: <456038DD.4040304@v.loewis.de> <ejs6hi$f9o$1@sea.gmane.org>	
	<45622FF1.4030202@v.loewis.de>
	<ca471dc20611201724m4bbb7afapf0196f4383fb042b@mail.gmail.com>
Message-ID: <45629504.1070002@v.loewis.de>

Guido van Rossum schrieb:
> Is there any reason to assume the data size is ever less than the
> block size except for the last data block? It's reading from a
> pseudo-file tied to a socket, but Python files tend to have the
> property that read(n) returns exactly n bytes unless at EOF.

Right: socket._fileobject will invoke recv as many times as
necessary to read the requested amount of data. I was somehow
assuming that it maps read() to read(2), which, in turn, would
directly map to recv(2), which could return less data.

So it's a semantic change only for the last block.

Regards,
Martin


From martin at v.loewis.de  Tue Nov 21 07:01:26 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 07:01:26 +0100
Subject: [Python-Dev] Suggestion/ feature request
In-Reply-To: <014701c706e4$75612780$24b75ba5@aero.ad.tamu.edu>
References: <014701c706e4$75612780$24b75ba5@aero.ad.tamu.edu>
Message-ID: <45629636.6090407@v.loewis.de>

Julian schrieb:
> I am using python with swig and I get a lot of macro redefinition warnings
> like so:
> warning C4005: '_CRT_SECURE_NO_DEPRECATE' : macro redefinition
> 
> In the file - pyconfig.h - rather than the following lines, I was wondering
> if it would be more reasonable to use #ifdef statements as shown in the
> bottom of the email...

While I agree that would be reasonable, I also wonder why you are
getting these errors. Where is the first definition of these macros, and
how is the macro defined at the first definition?

Regards,
Martin

From martin at v.loewis.de  Tue Nov 21 07:07:58 2006
From: martin at v.loewis.de (=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 07:07:58 +0100
Subject: [Python-Dev] POSIX Capabilities
In-Reply-To: <20061117121522.GA13677@pling.qwghlm.org>
References: <20061117121522.GA13677@pling.qwghlm.org>
Message-ID: <456297BE.6040409@v.loewis.de>

Matt Kern schrieb:
> I was looking around for an interface to POSIX capabilities from Python
> under Linux.  I couldn't find anything that did the job, so I wrote the
> attached PosixCapabilities module.   It has a number of shortcomings:
> 
>   * it is written using ctypes to interface directly to libcap;
>   * it assumes the sizes/types of various POSIX defined types;
>   * it only gets/sets process capabilities;
>   * it can test/set/clear capability flags.
> 
> Despite the downsides, I think it would be good to get the package out
> there.  If anyone wishes to adopt it, update it, rewrite it and/or put
> it into the distribution, then feel free.

As Aahz says: make a distutils package out of it, and upload it to
the Cheeseshop.

For inclusion into Python, I would rather prefer to see the traditional
route: make an autoconf test for presence of these functions, then
edit Modules/posixmodule.c to conditionally expose these APIs from
posix/os (they are POSIX functions, after all). The standard library
should expose them as-is, without providing a convenience wrapper.

I believe your implementation has limited portability, due to its
usage of hard-coded symbolic values for the capabilites (I guess
this is the Linux numbering, right?). Unfortunately, a ctypes-based
implementation can't really do much better.

Regards,
Martin

From martin at v.loewis.de  Tue Nov 21 07:09:25 2006
From: martin at v.loewis.de (=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=)
Date: Tue, 21 Nov 2006 07:09:25 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <20061121025304.20948.524767534.divmod.quotient.36726@ohm>
References: <20061121025304.20948.524767534.divmod.quotient.36726@ohm>
Message-ID: <45629815.3060903@v.loewis.de>

Jean-Paul Calderone schrieb:
>> I'm uncertain whether D1.update(D2) will invoke callbacks (it
>> probably will).
> 
> Quite so:
> 
>     >>> class X:
>     ...     def __del__(self):
>     ...             print 'X.__del__'
>     ... 
>     >>> a = {1: X()}
>     >>> b = {1: 2}
>     >>> a.update(b)
>     X.__del__
>     >>> 

Ah, right: that's true for any assignment, then.

Regards,
Martin

From arigo at tunes.org  Tue Nov 21 12:51:15 2006
From: arigo at tunes.org (Armin Rigo)
Date: Tue, 21 Nov 2006 12:51:15 +0100
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <45629504.1070002@v.loewis.de>
References: <456038DD.4040304@v.loewis.de> <ejs6hi$f9o$1@sea.gmane.org>
	<45622FF1.4030202@v.loewis.de>
	<ca471dc20611201724m4bbb7afapf0196f4383fb042b@mail.gmail.com>
	<45629504.1070002@v.loewis.de>
Message-ID: <20061121115115.GA24321@code0.codespeak.net>

Hi Martin,

On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote:
> Right: socket._fileobject will invoke recv as many times as
> necessary to read the requested amount of data. I was somehow
> assuming that it maps read() to read(2), which, in turn, would
> directly map to recv(2), which could return less data.
> 
> So it's a semantic change only for the last block.

That means that it would be rather pointless to make the change, right?
The original poster's motivation is to get accurate progress during the
transfer - but he missed that he already gets that.

The proposed change only appears to be relevant together with a
hypothetical rewrite of the underlying code, one that would use recv()
instead of read().


A bientot,

Armin

From arigo at tunes.org  Tue Nov 21 13:08:37 2006
From: arigo at tunes.org (Armin Rigo)
Date: Tue, 21 Nov 2006 13:08:37 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <4562326E.904@v.loewis.de>
References: <ejsvvj$hdc$1@sea.gmane.org> <4562326E.904@v.loewis.de>
Message-ID: <20061121120837.GB24321@code0.codespeak.net>

Hi Martin,

On Mon, Nov 20, 2006 at 11:55:42PM +0100, "Martin v. L?wis" wrote:
> In general, += isn't atomic: it may invoke __add__ or __iadd__ on the
> left-hand side, or __radd__ on the right-hand side.

> If you only look at the actual operation, the these aren't atomic:
> 
> x.field = y # may invoke __setattr__, may also be a property
> D[x] = y    # may invoke x.__hash__, and x.__eq__

I think this list of examples isn't meant to be read that way.  Half of
them can invoke custom methods, not just the two you mention here.  I
think the idea is that provided only "built-in enough" objects are
involved, the core operation described by each line works atomically, in
the sense e.g. that if two threads do 'L.append(x)' you really add two
items to the list (only the order is unspecified), and if two threads
perform  x.field = y  roughly at the same time, and the type of  x
doesn't override the default __setattr__ logic, then you know that the
object  x  will end up with a 'field' that is present and has exactly
one of the two values that the threads tried to put in.

Python programs rely on these kind of properties, and they are probably
a good thing - at least, much better IMHO than having to put locks
everywhere.  I would even say that the distinction between "preventing
the interpreter from crashing" and "getting sane results" is not really
relevant.  If your program doesn't crash the interpreter, but loose some
append()s or produce similar nonsense if you forget a lock, then we get
the drawbacks of the GIL without its benefits...

In practice, the list of operations that is atomic should (ideally) be
documented more precisely -- one way to do that is to specify it at the
level of built-in methods instead of syntax, e.g. saying that the method
list.append() works atomically, and so does dict.setdefault() as long as
all keys are "built-in enough" objects.


A bientot,

Armin

From guido at python.org  Tue Nov 21 17:10:03 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 21 Nov 2006 08:10:03 -0800
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <20061121115115.GA24321@code0.codespeak.net>
References: <456038DD.4040304@v.loewis.de> <ejs6hi$f9o$1@sea.gmane.org>
	<45622FF1.4030202@v.loewis.de>
	<ca471dc20611201724m4bbb7afapf0196f4383fb042b@mail.gmail.com>
	<45629504.1070002@v.loewis.de>
	<20061121115115.GA24321@code0.codespeak.net>
Message-ID: <ca471dc20611210810q64648a4fncf1cb434baaab80b@mail.gmail.com>

OK, so let's reject the change.

On 11/21/06, Armin Rigo <arigo at tunes.org> wrote:
> Hi Martin,
>
> On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote:
> > Right: socket._fileobject will invoke recv as many times as
> > necessary to read the requested amount of data. I was somehow
> > assuming that it maps read() to read(2), which, in turn, would
> > directly map to recv(2), which could return less data.
> >
> > So it's a semantic change only for the last block.
>
> That means that it would be rather pointless to make the change, right?
> The original poster's motivation is to get accurate progress during the
> transfer - but he missed that he already gets that.
>
> The proposed change only appears to be relevant together with a
> hypothetical rewrite of the underlying code, one that would use recv()
> instead of read().
>
>
> A bientot,
>
> Armin
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From facundobatista at gmail.com  Tue Nov 21 17:42:40 2006
From: facundobatista at gmail.com (Facundo Batista)
Date: Tue, 21 Nov 2006 13:42:40 -0300
Subject: [Python-Dev] Results of the SOC projects
In-Reply-To: <ejg3hb$uls$1@sea.gmane.org>
References: <ejg3hb$uls$1@sea.gmane.org>
Message-ID: <e04bdf310611210842r2e48488dg106a113180a6545b@mail.gmail.com>

2006/11/15, Georg Brandl <g.brandl at gmx.net>:

> this might seem a bit late, and perhaps I was just blind,
> but I miss something like a summary how the Python
> summer of code projects went, and what the status of the ones
> that were meant to improve the standard library, e.g. the
> C decimal implementation, is.

The C decimal implementation is quite finished, but really not ready
for production usage.

Actually, what this work proved is that is not enough to translate
decimal.py, there should be a redesign of the structure. There's a lot
of mails from Raymond H. about this. And he's right.

Regarding the SOC, I approved Matheusz's work, because he finished the
task, and even if we need to recode it, we learned in the process.

You're free to look at it, the C decimal implementation is in the sandbox.

Regards,

-- 
.    Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

From martin at v.loewis.de  Tue Nov 21 19:22:27 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 19:22:27 +0100
Subject: [Python-Dev] Suggestion/ feature request
In-Reply-To: <009201c70d36$714dd5f0$24b75ba5@aero.ad.tamu.edu>
References: <009201c70d36$714dd5f0$24b75ba5@aero.ad.tamu.edu>
Message-ID: <456343E3.4000203@v.loewis.de>

Julian schrieb:
> SWIG seems to have done it properly by checking to see if it has been
> defined already (which, I think, is how python should do it as well)
> Now, even if I am not using SWIG, I could imagine these being defined
> elsewhere (by other headers/libraries) or even by setting them in the VS2005
> IDE project settings (which I actually do sometimes). While these are *just*
> warnings and not errors, it would look cleaner if pyconfig.h would check if
> they were defined already.

Sure; I have fixed this now in r52817 and r52818

I just wondered why you get the warning: you shouldn't get one if the
redefinition is the same as the original one. In this case, it wasn't
the same redefinition, as SWIG was merely defining them, and Python
was defining them to 1.

Regards,
Martin

From martin at v.loewis.de  Tue Nov 21 19:30:24 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 19:30:24 +0100
Subject: [Python-Dev] Passing actual read size to urllib reporthook
In-Reply-To: <20061121115115.GA24321@code0.codespeak.net>
References: <456038DD.4040304@v.loewis.de> <ejs6hi$f9o$1@sea.gmane.org>
	<45622FF1.4030202@v.loewis.de>
	<ca471dc20611201724m4bbb7afapf0196f4383fb042b@mail.gmail.com>
	<45629504.1070002@v.loewis.de>
	<20061121115115.GA24321@code0.codespeak.net>
Message-ID: <456345C0.6010204@v.loewis.de>

Armin Rigo schrieb:
> Hi Martin,
> 
> On Tue, Nov 21, 2006 at 06:56:20AM +0100, "Martin v. L?wis" wrote:
>> Right: socket._fileobject will invoke recv as many times as
>> necessary to read the requested amount of data. I was somehow
>> assuming that it maps read() to read(2), which, in turn, would
>> directly map to recv(2), which could return less data.
>>
>> So it's a semantic change only for the last block.
> 
> That means that it would be rather pointless to make the change, right?

Right; I rejected the patch. Thanks for all your input.

Regards,
Martin

From martin at v.loewis.de  Tue Nov 21 19:41:50 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 19:41:50 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <20061121120837.GB24321@code0.codespeak.net>
References: <ejsvvj$hdc$1@sea.gmane.org> <4562326E.904@v.loewis.de>
	<20061121120837.GB24321@code0.codespeak.net>
Message-ID: <4563486E.30104@v.loewis.de>

Armin Rigo schrieb:
> I think this list of examples isn't meant to be read that way.  Half of
> them can invoke custom methods, not just the two you mention here.  I
> think the idea is that provided only "built-in enough" objects are
> involved, the core operation described by each line works atomically, in
> the sense e.g. that if two threads do 'L.append(x)' you really add two
> items to the list (only the order is unspecified), and if two threads
> perform  x.field = y  roughly at the same time, and the type of  x
> doesn't override the default __setattr__ logic, then you know that the
> object  x  will end up with a 'field' that is present and has exactly
> one of the two values that the threads tried to put in.

Ah, so it's more about Consistency (lists not being corrupted) than
about Atomicity (operations either succeeding completely or failing
completely). Perhaps it's also about Isolation (no intermediate results
visible), but I'm not so sure which of these operations are isolated
(given the callbacks).

> Python programs rely on these kind of properties, and they are probably
> a good thing - at least, much better IMHO than having to put locks
> everywhere.  I would even say that the distinction between "preventing
> the interpreter from crashing" and "getting sane results" is not really
> relevant.  If your program doesn't crash the interpreter, but loose some
> append()s or produce similar nonsense if you forget a lock, then we get
> the drawbacks of the GIL without its benefits...

So again, I think it's consistency you are after here (of the ACID
properties).

> In practice, the list of operations that is atomic should (ideally) be
> documented more precisely -- one way to do that is to specify it at the
> level of built-in methods instead of syntax, e.g. saying that the method
> list.append() works atomically, and so does dict.setdefault() as long as
> all keys are "built-in enough" objects.

But many of these operations don't work atomically! (although .append
does) For example, x = y may cause __del__ for the old value of x to
be invoked, which may fail with an exception. If it fails, the
assignment is still carried out, instead of being rolled back.

Regards,
Martin


From fumanchu at amor.org  Tue Nov 21 20:29:22 2006
From: fumanchu at amor.org (Robert Brewer)
Date: Tue, 21 Nov 2006 11:29:22 -0800
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
Message-ID: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>

Martin v. L?wis wrote:
> Armin Rigo schrieb:
> > I think this list of examples isn't meant to be read that 
> > way.  Half of them can invoke custom methods, not just the
> > two you mention here.
> 
> Ah, so it's more about Consistency (lists not being corrupted)
> than about Atomicity (operations either succeeding completely
> or failing completely). Perhaps it's also about Isolation (no 
> intermediate results visible), but I'm not so sure which of
> these operations are isolated (given the callbacks).

It's not "about" any of those things, because we're not discussing transactional models. The FAQ entry is trying to list statements which can be considered a single operation due to being implemented via a single bytecode. By eliminating statements which use multiple VM instructions, one minimizes overlapping operations; there are other ways, but this is easy and common, and is an important "first step" toward making a container "thread-safe". You're bringing in other, larger issues, which is fine and should be addressed in a larger context. But the FAQ isn't trying to address those.

The confusion arises because transactional theory uses "atomic transaction" in a much narrower sense than language design uses the phrase "atomic operation" (see http://en.wikipedia.org/wiki/Atomic_operation for example--it includes isolation and consistency). And the FAQ entry is only addressing the "isolation" concern; whether or not a given operation can be interrupted/overlapped. Those who design thread-safe containers benefit from such a list. Yes, they must also make sure no Python code (like __del__ or __setattr_, etc) is invoked during the operation; others have already pointed out that by using builtins, this can be minimized. But that "second step" doesn't negate the benefit of the "first step", eliminating statements which require multiple VM instructions.


Robert Brewer
System Architect
Amor Ministries
fumanchu at amor.org

From martin at v.loewis.de  Tue Nov 21 21:01:35 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 21:01:35 +0100
Subject: [Python-Dev] Suggestion/ feature request
In-Reply-To: <000001c70da1$15260f20$24b75ba5@aero.ad.tamu.edu>
References: <000001c70da1$15260f20$24b75ba5@aero.ad.tamu.edu>
Message-ID: <45635B1F.4090401@v.loewis.de>

Julian schrieb:
> I have two questions though... Is there any reason why Python is defining
> them to 1?

No particular reason, except that it's C tradition to give macros a
value of 1 when you define them.

> And then later on in the same file:
> /* Turn off warnings about deprecated C runtime functions in 
>    VisualStudio .NET 2005 */
> #if _MSC_VER >= 1400 && !defined _CRT_SECURE_NO_DEPRECATE
> #define _CRT_SECURE_NO_DEPRECATE
> #endif
> 
> Isn't that redundant? 

It is indeed.

> I don't think that second block will ever get
> executed. Moreover, in the second block, it is not being defined to 1. why
> is that ?

Different people have contributed this; the first one came from

r41563 | martin.v.loewis | 2005-11-29 18:09:13 +0100 (Di, 29 Nov 2005)

Silence VS2005 warnings about deprecated functions.

and the second one from

r46778 | kristjan.jonsson | 2006-06-09 18:28:01 +0200 (Fr, 09 Jun 2006)
| 2 lines

Turn off warning about deprecated CRT functions on for VisualStudio .NET
2005.
Make the definition #ARRAYSIZE conditional.  VisualStudio .NET 2005
already has it defined using a better gimmick.

Regards,
Martin

From martin at v.loewis.de  Tue Nov 21 21:24:07 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 21:24:07 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>
Message-ID: <45636067.7040305@v.loewis.de>

Robert Brewer schrieb:
> The confusion arises because transactional theory uses "atomic
> transaction" in a much narrower sense than language design uses the
> phrase "atomic operation" (see
> http://en.wikipedia.org/wiki/Atomic_operation for example--it
> includes isolation and consistency). And the FAQ entry is only
> addressing the "isolation" concern; whether or not a given operation
> can be interrupted/overlapped. Those who design thread-safe
> containers benefit from such a list. Yes, they must also make sure no
> Python code (like __del__ or __setattr_, etc) is invoked during the
> operation; others have already pointed out that by using builtins,
> this can be minimized. But that "second step" doesn't negate the
> benefit of the "first step", eliminating statements which require
> multiple VM instructions.

Ok. I think I would have understood that FAQ entry better if it
had said: "These operations are represented in a single byte-code
operation", instead of saying "they are thread-safe", or "they are
atomic". Of course, Josiah Carlson's remark then still applies:
all statements listed there take multiple byte codes, because
you have to put the parameters onto the stack first.

This is more than hypothetical. If two threads do simultaneously

thread1            thread2
x = y              y = x

then, if these were "atomic", you would expect that afterwards,
both variables have the same value: Either thread1 executes
first, which means that x has the value of y (and thread2's
operation has no effect), or thread2 executes first, in which
case both variables get x's original value.

However, in Python, it may happen that afterwards, the values
get swapped: thread1 loads y onto the stack, then a context
switch occurs, then thread2 sets y = x (so y gets x's value),
later thread1 becomes active again, and x gets y's original
value (from the thread1 stack).

If you were looking for actions where the "core" operation
is a single opcode, then this list could be much longer:
all of the following are "atomic" or "thread-safe", too:

unary operations (+x, -x, not x, ~x)
binary operations (a+b,a-b,a*b,a/b,a//b,a[b])
exec "string"
del x
del x.field
del x[i]

As for the original questions: "x+=1" is two "atomic"
operations, not one. Or, more precisely, it's 4 opcodes,
not 2.

Regards,
Martin

From arigo at tunes.org  Tue Nov 21 22:56:20 2006
From: arigo at tunes.org (Armin Rigo)
Date: Tue, 21 Nov 2006 22:56:20 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <45636067.7040305@v.loewis.de>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>
	<45636067.7040305@v.loewis.de>
Message-ID: <20061121215620.GA24206@code0.codespeak.net>

Hi Martin,

On Tue, Nov 21, 2006 at 09:24:07PM +0100, "Martin v. L?wis" wrote:
> As for the original questions: "x+=1" is two "atomic"
> operations, not one. Or, more precisely, it's 4 opcodes,
> not 2.

Or, more interestingly, the same is true for constructs like 'd[x]+=1':
they are a sequence of three bytecodes that may overlap other threads
(read d[x], add 1, store the result back in d[x]) so it's not a
thread-safe way to increment a counter.

(More generally it's very easy to forget that expr1[expr2] += expr3
really means

    x = expr1;  y = expr2;  x[y] = x[y] + expr3

using a '+' that is special only in that it invokes the __iadd__ instead
of the __add__ method, if there is one.)


A bientot,

Armin

From martin at v.loewis.de  Tue Nov 21 23:24:21 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 21 Nov 2006 23:24:21 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <20061121215620.GA24206@code0.codespeak.net>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>
	<45636067.7040305@v.loewis.de>
	<20061121215620.GA24206@code0.codespeak.net>
Message-ID: <45637C95.6050907@v.loewis.de>

Armin Rigo schrieb:
> Or, more interestingly, the same is true for constructs like 'd[x]+=1':
> they are a sequence of three bytecodes that may overlap other threads
> (read d[x], add 1, store the result back in d[x]) so it's not a
> thread-safe way to increment a counter.
> 
> (More generally it's very easy to forget that expr1[expr2] += expr3
> really means
> 
>     x = expr1;  y = expr2;  x[y] = x[y] + expr3
> 
> using a '+' that is special only in that it invokes the __iadd__ instead
> of the __add__ method, if there is one.)

OTOH, using += is "thread-safe" if the object is mutable (e.g. a list),
and all modifications use +=. In that case, __iadd__ will be invoked,
which may (for lists) or may not (for other types) be thread-safe.
Since the same object gets assigned to the original slot in all
threads, execution order does not really matter.

I personally consider it "good style" to rely on implementation details
of CPython; if you do, you have to know precisely what these details
are, and document why you think a specific fragment of code is correct.

Regards,
Martin

From ncoghlan at gmail.com  Wed Nov 22 10:32:23 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Wed, 22 Nov 2006 19:32:23 +1000
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <45637C95.6050907@v.loewis.de>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>	<45636067.7040305@v.loewis.de>	<20061121215620.GA24206@code0.codespeak.net>
	<45637C95.6050907@v.loewis.de>
Message-ID: <45641927.7080501@gmail.com>

Martin v. L?wis wrote:
> I personally consider it "good style" to rely on implementation details
> of CPython;

Is there a 'do not' missing somewhere in there?

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From steven.bethard at gmail.com  Wed Nov 22 20:48:48 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 22 Nov 2006 12:48:48 -0700
Subject: [Python-Dev] DRAFT: python-dev summary for 2006-10-16 to 2006-10-31
Message-ID: <d11dcfba0611221148n24e648bdy7a6e22cf121f9657@mail.gmail.com>

Here's the summary for the second half of October.  Comments and
corrections welcome as always, especially on that extended buffer
protocol / binary format specifier discussion which was a little
overwhelming. ;-)


=============
Announcements
=============

--------------------------------------
Roundup to replace SourceForge tracker
--------------------------------------

Roundup_ has been named as the official replacement for the
SourceForge_ issue tracker. Thanks go out to the new volunteer admins,
Paul DuBois, Michael Twomey, Stefan Seefeld, and Erik Forsberg, and
also to `Upfront Systems`_ who will be hosting the tracker. If you'd
like to provide input on what the new tracker should do, please join
the `tracker-discuss mailing list`_.

.. _SourceForge: http://www.sourceforge.net/
.. _Roundup: http://roundup.sourceforge.net/
.. _Upfront Systems: http://www.upfrontsystems.co.za/
.. _tracker-discuss mailing list:
http://mail.python.org/mailman/listinfo/tracker-discuss

Contributing threads:

- `PSF Infrastructure has chosen Roundup as the issue tracker for
Python development
<http://mail.python.org/pipermail/python-dev/2006-October/069528.html>`__
- `Status of new issue tracker
<http://mail.python.org/pipermail/python-dev/2006-October/069658.html>`__


=========
Summaries
=========

---------------------------------------------------------------
The buffer protocol and communicating binary format information
---------------------------------------------------------------

Travis E. Oliphant presented a pre-PEP for adding a standard way to
describe the shape and intended types of binary-formatted data. It was
accompanied by a pre-PEP for extending the buffer protocol to handle
such shapes and types. Under the proposal, a new ``datatype`` object
would describe binary-formatted data with an API like::

    datatype((float, (3,2))
    # describes a 3*2*8=48 byte block of memory that should be interpreted
    # as 6 doubles laid out as arr[0,0], arr[0,1], ... a[2,0], a[1,2]

    datatype([( ([1,2],'coords'), 'f4', (3,6)), ('address', 'S30')])
    # describes the structure
    #     float coords[3*6]   /* Has [1,2] associated with this field */
    #     char  address[30]

Alexander Belopolsky provided a nice example of why you might want to
extend the buffer protocol along these lines. Currently, there's not
much you can do with a basic buffer object. If you want to pass it to
numpy_, you have to provide the type and shape information yourself::

    >>> b = buffer(array('d', [1,2,3]))
    >>> numpy.ndarray(shape=(3,), dtype=float, buffer=b)
    array([ 1.,  2.,  3.])

By extending the buffer protocol appropriately so that the necessary
information can be provided, you should be able to pass the buffer
directly to numpy_ and have it understand the format itself::

    >>> numpy.array(b)

People were uncomfortable with the many ``datatype`` variants -- the
constructor accepted types, strings, lists or dicts, each of which
could specify the structure in a different way. Also, a number of
people questioned why the existing ``ctypes`` mechanisms for
describing binary data couldn't be used instead, particularly since
``ctypes`` could already describe things like function pointers and
recursive types, which the pre-PEP could not. Travis said he was
looking for a way to unify the data formats of all the ``array``,
``struct``, ``numpy`` and ``ctypes`` modules, and felt like using the
``ctypes`` approach was too verbose for use in the other modules. In
particular, he felt like the ``ctypes`` use of type objects as
binary-format specifiers was problematic because type objects were
harder to manipulate at the C level.

The discussion continued on into the next fortnight.

.. _numpy:


Contributing threads:

- `PEP: Adding data-type objects to Python
<http://mail.python.org/pipermail/python-dev/2006-October/069602.html>`__
- `PEP: Extending the buffer protocol to share array information.
<http://mail.python.org/pipermail/python-dev/2006-October/069681.html>`__

------------------------
The "lazy strings" patch
------------------------

Discussion continued on Larry Hastings `lazy strings patch`_ that
would have delayed until necessary the evaluation of some string
operations, like concatenation and slicing. With his patch, repeated
string concatenation could be used instead of the standard ``.join()``
idiom, and slices which were never used would never be rendered.
Discussions of the patch showed that people were concerned about
memory increases when a small slice of a very large string kept the
large string around in memory. People also felt like a stronger
motivation was necessary to justify complicating the string
representation so much. Larry was pointed to some `code that his patch
would break`_, which was using ``ob_sval`` directly instead of calling
``PyString_AS_STRING()`` like it was supposed to. He was also referred
to the `Python 3000 list`_ where the recent discussions of `string
views`_ would be relevant, and his proposal might have a better chance
of acceptance.

.. _lazy strings patch: http://bugs.python.org/1569040
.. _code that his patch would break:
http://www.google.com/codesearch?hl=en&lr=&q=ob_sval+-stringobject.%5Bhc%5D&btnG=Search
.. _Python 3000 list: http://mail.python.org/mailman/listinfo/python-3000
.. _string views:
http://mail.python.org/pipermail/python-3000/2006-August/003280.html

Contributing threads:

- `PATCH submitted: Speed up + for string Re: PATCH submitted: Speed
up + for string concatenation, now as fast as "".join(x) idiom
<http://mail.python.org/pipermail/python-dev/2006-October/069459.html>`__
- `Python-Dev Digest, Vol 39, Issue 54
<http://mail.python.org/pipermail/python-dev/2006-October/069468.html>`__
- `Python-Dev Digest, Vol 39, Issue 55
<http://mail.python.org/pipermail/python-dev/2006-October/069485.html>`__
- `The "lazy strings" patch [was: PATCH submitted: Speed up + for
string concatenation, now as fast as "".join(x) idiom]
<http://mail.python.org/pipermail/python-dev/2006-October/069506.html>`__
- `The "lazy strings" patch
<http://mail.python.org/pipermail/python-dev/2006-October/069509.html>`__

--------------
PEP 355 status
--------------

BJ?rn Lindqvist wanted to wrap up the loose ends of `PEP 355`_ and
asked whether the problem was the specific path object of `PEP 355`_
or path objects in general. A number of people felt that some
reorganization of the path-related functions could be helpful, but
that trying to put everything into a single object was a mistake. Some
important requirements for a reorganization of the path-related
functions:

* should divide the functions into coherent groups
* should allow you to manipulate paths foreign to your OS

There were a few suggestions of possible new APIs, but no concrete
implementations. People seemed hopeful that the issue could be
resurrected for Python 3K, but no one appeared to be taking the lead.

.. _PEP 355: http://www.python.org/dev/peps/pep-0355/

Contributing thread:

- `PEP 355 status
<http://mail.python.org/pipermail/python-dev/2006-October/069570.html>`__

--------------------------------------------------
Buildbots, configure changes and extension modules
--------------------------------------------------

Grig Gheorghiu, who's been taking care of the `Python Community
Buildbots`_, noticed that the buildbots started failing after a
checkin that made changes to ``configure``. Martin v. L?wis explained
that even though a plain ``make`` will trigger a re-run of
``configure`` if it has changed, there is an issue with distutils not
rebuilding when header files change, and so extension modules are
sometimes not rebuilt. Contributions to fix that deficiency in
distutils are welcome.

Martin also pointed out a handy way of forcing a buildbot to start
with a clean build: ask the buildbot to build a non-existing branch.
This causes the checkouts to be deleted and the build to fail. The
next regular build will then start from scratch.

.. _Python Community Buildbots: http://www.pybots.org/

Contributing thread:

- `Python unit tests failing on Pybots farm
<http://mail.python.org/pipermail/python-dev/2006-October/069479.html>`__

---------------
Sqlite versions
---------------

Skip Montanaro ran into some problems running ``test_sqlite`` on OSX
where he was getting a bunch of ``ProgrammingError: library routine
called out of sequence`` errors. These errors appeared reliably when
``test_sqlite`` was run immediately after ctypes' ``test_find``. When
he started linking to sqlite 3.1.3 instead of sqlite 3.3.8, the
problems went away. Barry Warsaw mentioned that he had run into
similar troubles when he tried to upgrade from 3.2.1 to 3.2.8.

Contributing thread:

- `Massive test_sqlite failure on Mac OSX ... sometimes
<http://mail.python.org/pipermail/python-dev/2006-October/069504.html>`__

---------------------------------------------
Threads, generators, exceptions and segfaults
---------------------------------------------

Mike Klaas managed to `provoke a segfault`_ in Python 2.5 using
threads, generators and exceptions. Tim Peters was able to whittle
Mike's problem down to a relatively simple test case, where a
generator was created within a thread, and then the thread vanished
before the generator had exited. The segfault was a result of Python's
attempt to clean up the abandoned generator, during which it tried to
access the generator's already free()'d thread state. No clear
solution to this problem had been decided on at the time of this
summary.

.. _provoke a segfault: http://bugs.python.org/1579370

Contributing thread:

- `Segfault in python 2.5
<http://mail.python.org/pipermail/python-dev/2006-October/069464.html>`__

----------------
ctypes and win64
----------------

Previously, Thomas Heller had asked that ctypes be removed from the
Python 2.5 win64 MSI installers since it did not work for that
platform at the time. Since then, Thomas integrated some patches in
the trunk so that _ctypes could be built for win64/AMD64. Backporting
these fixes to Python 2.5 would have meant that, while the MSI
installer would still not include it, _ctypes could be built from a
source distribution on win64/AMD64. It was unclear whether this would
constitute a bugfix (in which case the backport would be okay) or a
feature (in which case it wouldn't).

Contributing thread:

- `ctypes and win64
<http://mail.python.org/pipermail/python-dev/2006-October/069495.html>`__

------------------------------
Python 2.3.X and 2.4.X retired
------------------------------

Anthony Baxter pushed out a Python 2.4.4 release and was pushing out
the Python 2.3.6 source release as well. He indicated that once 2.3.6
was out, both of these branches could be officially retired.

Contributing thread:

- `state of the maintenance branches
<http://mail.python.org/pipermail/python-dev/2006-October/069474.html>`__

---------------------------------------
Producing bytecode from Python 2.5 ASTs
---------------------------------------

Michael Spencer offered up his compiler2_ module, a rewrite of the
compiler module which allows bytecode to be produced from ``_ast.AST``
objects. Currently, it produces almost identical output to
``__builtin__.compile`` for all the stdlib modules and their tests. He
asked for feedback on what would be necessary to get it stdlib ready,
but had no responses.

.. _compiler2: http://svn.brownspencer.com/pycompiler/branches/new_ast/

Contributing thread:

- `Fwd: Re: ANN compiler2 : Produce bytecode from Python 2.5 AST
<http://mail.python.org/pipermail/python-dev/2006-October/069589.html>`__


==================
Previous Summaries
==================
- `Python 2.5 performance
<http://mail.python.org/pipermail/python-dev/2006-October/069438.html>`__
- `Promoting PCbuild8 (Was: Python 2.5 performance)
<http://mail.python.org/pipermail/python-dev/2006-October/069440.html>`__
- `2.3.6 for the unicode buffer overrun
<http://mail.python.org/pipermail/python-dev/2006-October/069442.html>`__
- `2.4.4: backport classobject.c HAVE_WEAKREFS?
<http://mail.python.org/pipermail/python-dev/2006-October/069494.html>`__


===============
Skipped Threads
===============
- `Weekly Python Patch/Bug Summary
<http://mail.python.org/pipermail/python-dev/2006-October/069437.html>`__
- `Problem building module against Mac Python 2.4 and Python 2.5
<http://mail.python.org/pipermail/python-dev/2006-October/069441.html>`__
- `svn.python.org down
<http://mail.python.org/pipermail/python-dev/2006-October/069447.html>`__
- `BRANCH FREEZE release24-maint, Wed 18th Oct, 00:00UTC
<http://mail.python.org/pipermail/python-dev/2006-October/069449.html>`__
- `who is interested on being on a python-dev panel at PyCon?
<http://mail.python.org/pipermail/python-dev/2006-October/069457.html>`__
- `RELEASED Python 2.4.4, Final.
<http://mail.python.org/pipermail/python-dev/2006-October/069472.html>`__
- `Nondeterministic long-to-float coercion
<http://mail.python.org/pipermail/python-dev/2006-October/069477.html>`__
- `Promoting PCbuild8
<http://mail.python.org/pipermail/python-dev/2006-October/069490.html>`__
- `OT: fdopen on Windows question
<http://mail.python.org/pipermail/python-dev/2006-October/069503.html>`__
- `Modulefinder
<http://mail.python.org/pipermail/python-dev/2006-October/069518.html>`__
- `Optional type checking/pluggable type systems for Python
<http://mail.python.org/pipermail/python-dev/2006-October/069520.html>`__
- `readlink and unicode strings (SF:1580674) Patch
http://www.python.org/sf/1580674 fixes readlink's behaviour w.r.t.
Unicode strings: without this patch this function uses the system
default encoding instead of the filesystem encoding to convert Unicode
objects to plain strings. Like os.listdir, os.readlink will now return
a Unicode object when the argument is a Unicode object. What I'd like
to know is if this can be backported to the 2.5 branch. The first part
of this patch (use filesystem encoding instead of the system encoding)
is IMHO a bugfix, the second part might break existing applications
(that might not expect a unicode result from os.readlink). The reason
I did this patch is that os.path.realpath currently breaks when the
path is a unicode string with non-ascii characters and at least one
element of the path is a symlink. Ronald
<http://mail.python.org/pipermail/python-dev/2006-October/069524.html>`__
- `readlink and unicode strings (SF:1580674)
<http://mail.python.org/pipermail/python-dev/2006-October/069526.html>`__
- `RELEASED Python 2.3.6, release candidate 1
<http://mail.python.org/pipermail/python-dev/2006-October/069553.html>`__
- `__str__ bug?
<http://mail.python.org/pipermail/python-dev/2006-October/069554.html>`__
- `Hunting down configure script error
<http://mail.python.org/pipermail/python-dev/2006-October/069558.html>`__
- `Python 2.4.4 docs?
<http://mail.python.org/pipermail/python-dev/2006-October/069571.html>`__
- `DRAFT: python-dev summary for 2006-09-01 to 2006-09-15
<http://mail.python.org/pipermail/python-dev/2006-October/069598.html>`__
- `DRAFT: python-dev summary for 2006-09-16 to 2006-09-30
<http://mail.python.org/pipermail/python-dev/2006-October/069603.html>`__
- `[Python-checkins] r52482 - in python/branches/release25-maint:
Lib/urllib.py Lib/urllib2.py Misc/NEWS
<http://mail.python.org/pipermail/python-dev/2006-October/069605.html>`__
- `Typo.pl scan of Python 2.5 source code
<http://mail.python.org/pipermail/python-dev/2006-October/069608.html>`__
- `build bots, log output
<http://mail.python.org/pipermail/python-dev/2006-October/069617.html>`__
- `PyCon: proposals due by Tuesday 10/31
<http://mail.python.org/pipermail/python-dev/2006-October/069647.html>`__
- `test_codecs failures
<http://mail.python.org/pipermail/python-dev/2006-October/069662.html>`__

From martin at v.loewis.de  Wed Nov 22 22:49:26 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 22 Nov 2006 22:49:26 +0100
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <45641927.7080501@gmail.com>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>	<45636067.7040305@v.loewis.de>	<20061121215620.GA24206@code0.codespeak.net>
	<45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com>
Message-ID: <4564C5E6.8070605@v.loewis.de>

Nick Coghlan schrieb:
> Martin v. L?wis wrote:
>> I personally consider it "good style" to rely on implementation details
>> of CPython;
> 
> Is there a 'do not' missing somewhere in there?

No - I really mean it. I can find nothing wrong with people relying on
reference counting to close files, for example. It's a property of
CPython, and not guaranteed in other Python implementations - yet it
works in a well-defined way in CPython. Code that relies on that feature
is not portable, but portability is only one goal in software
development, and may be irrelevant for some projects.

Likewise, I see nothing wrong with people relying on .append on a list
working "correctly" when used from two threads, even though the language
specification does not guarantee that property.

Similarly, it's fine when people rely on the C type "int" to have
32-bits when used with gcc on x86 Linux. The C standard makes that
implementation-defined, but this specific implementation made a
choice that you can rely on.

Regards,
Martin

From kbk at shore.net  Thu Nov 23 04:36:13 2006
From: kbk at shore.net (Kurt B. Kaiser)
Date: Wed, 22 Nov 2006 22:36:13 -0500 (EST)
Subject: [Python-Dev] Weekly Python Patch/Bug Summary
Message-ID: <200611230336.kAN3aD59005113@bayview.thirdcreek.com>

Patch / Bug Summary
___________________

Patches :  406 open (-10) /  3479 closed (+16) /  3885 total ( +6)
Bugs    :  931 open ( +1) /  6349 closed (+16) /  7280 total (+17)
RFE     :  245 open ( +1) /   244 closed ( +0) /   489 total ( +1)

New / Reopened Patches
______________________

Logging Module - followfile patch  (2006-11-17)
       http://python.org/sf/1598415  reopened by  cjschr

Logging Module - followfile patch  (2006-11-17)
       http://python.org/sf/1598415  opened by  chads

Logging Module - followfile patch  (2006-11-17)
CLOSED http://python.org/sf/1598426  opened by  chads

mailbox.py: check that os.fsync is available before using it  (2006-11-19)
       http://python.org/sf/1599256  opened by  David Watson

CodeContext - Improved text indentation  (2005-11-21)
CLOSED http://python.org/sf/1362975  reopened by  taleinat

TCPServer option to bind and activate  (2006-11-20)
       http://python.org/sf/1599845  opened by  Peter Parente

__bool__ instead of __nonzero__  (2006-11-21)
       http://python.org/sf/1600346  opened by  ganges master

1572210 doc patch  (2006-11-21)
       http://python.org/sf/1600491  opened by  Jim Jewett

Patches Closed
______________

Logging Module - followfile patch  (2006-11-17)
       http://python.org/sf/1598426  closed by  gbrandl

tkSimpleDialog.askstring()  Tcl/Tk-8.4 lockup  (2006-08-11)
       http://python.org/sf/1538878  closed by  loewis

tkSimpleDialog freezes when apply raises exception  (2006-11-11)
       http://python.org/sf/1594554  closed by  loewis

Tix: subwidget names (bug #1472877)  (2006-10-25)
       http://python.org/sf/1584712  closed by  loewis

better error msgs for some TypeErrors  (2006-10-29)
       http://python.org/sf/1586791  closed by  gbrandl

Auto Complete module for IDLE  (2005-11-19)
       http://python.org/sf/1361016  closed by  loewis

Add BLANK_LINE to token.py  (2004-11-20)
       http://python.org/sf/1070218  closed by  loewis

improve embeddability of python  (2003-11-25)
       http://python.org/sf/849278  closed by  loewis

Extend struct.unpack to produce nested tuples  (2003-11-23)
       http://python.org/sf/847857  closed by  loewis

Iterating closed StringIO.StringIO  (2005-11-18)
       http://python.org/sf/1359365  closed by  loewis

urllib reporthook could be more informative  (2003-11-26)
       http://python.org/sf/849407  closed by  loewis

xmlrpclib - marshalling new-style classes.  (2004-11-20)
       http://python.org/sf/1070046  closed by  loewis

CodeContext - Improved text indentation  (2005-11-21)
       http://python.org/sf/1362975  closed by  loewis

Implementation of PEP 3102 Keyword Only Argument  (2006-08-30)
       http://python.org/sf/1549670  closed by  gvanrossum

readline does not need termcap  (2004-12-01)
       http://python.org/sf/1076826  closed by  loewis

Make cgi.py use logging module  (2004-12-06)
       http://python.org/sf/1079729  closed by  loewis

New / Reopened Bugs
___________________

The empty set is a subset of the empty set  (2006-11-17)
CLOSED http://python.org/sf/1598166  opened by  Andreas Kloeckner

subprocess.py: O(N**2) bottleneck  (2006-11-16)
       http://python.org/sf/1598181  opened by  Ralf W. Grosse-Kunstleve

import curses fails  (2006-11-17)
CLOSED http://python.org/sf/1598357  opened by  thorvinrhuebarb

Misspelled submodule names for email module.  (2006-11-17)
CLOSED http://python.org/sf/1598361  opened by  Dmytro O. Redchuk

ctypes Structure allows recursive definition  (2006-11-17)
       http://python.org/sf/1598620  opened by  Lenard Lindstrom

csv library does not handle '\x00'  (2006-11-18)
CLOSED http://python.org/sf/1599055  opened by  Stephen Day

--disable-sunaudiodev --disable-tk does not work  (2006-10-17)
CLOSED http://python.org/sf/1579029  reopened by  thurnerrupert

mailbox: other programs' messages can vanish without trace  (2006-11-19)
       http://python.org/sf/1599254  opened by  David Watson

htmlentitydefs.entitydefs assumes Latin-1 encoding  (2006-11-19)
CLOSED http://python.org/sf/1599325  opened by  Erik Demaine

SSL-ed sockets don't close correct?  (2004-06-24)
       http://python.org/sf/978833  reopened by  arigo

Segfault on bsddb.db.DB().type()  (2006-11-20)
CLOSED http://python.org/sf/1599782  opened by  Rob Sanderson

problem with socket.gethostname documentation  (2006-11-20)
CLOSED http://python.org/sf/1599879  opened by  Malte Helmert

Immediate Crash on Open  (2006-11-20)
       http://python.org/sf/1599931  opened by  Farhymn

mailbox: Maildir.get_folder does not inherit factory  (2006-11-21)
       http://python.org/sf/1600152  opened by  Tetsuya Takatsuru

[PATCH] Quitting The Interpreter  (2006-11-20)
CLOSED http://python.org/sf/1600157  opened by  Chris Carter

Tix ComboBox entry is blank when not editable  (2006-11-21)
       http://python.org/sf/1600182  opened by  Tim Wegener

--enable-shared links extensions to libpython statically  (2006-11-22)
       http://python.org/sf/1600860  opened by  Marien Zwart

urllib2 does not close sockets properly  (2006-11-23)
       http://python.org/sf/1601399  opened by  Brendan Jurd

utf_8_sig decode fails with buffer input  (2006-11-23)
       http://python.org/sf/1601501  opened by  bazwal

Bugs Closed
___________

The empty set should be a subset of the empty set  (2006-11-17)
       http://python.org/sf/1598166  closed by  gbrandl

import curses fails  (2006-11-17)
       http://python.org/sf/1598357  closed by  akuchling

Misspelled submodule names for email module.  (2006-11-17)
       http://python.org/sf/1598361  closed by  gbrandl

Tix: Subwidget names  (2006-04-19)
       http://python.org/sf/1472877  closed by  loewis

replace groups doesn't work in this special case  (2006-11-06)
       http://python.org/sf/1591319  closed by  gbrandl

csv module does not handle '\x00'  (2006-11-19)
       http://python.org/sf/1599055  closed by  gbrandl

--disable-sunaudiodev --disable-tk does not work  (2006-10-17)
       http://python.org/sf/1579029  closed by  loewis

htmlentitydefs.entitydefs assumes Latin-1 encoding  (2006-11-19)
       http://python.org/sf/1599325  closed by  loewis

where is zlib???  (2006-11-04)
       http://python.org/sf/1590592  closed by  sf-robot

Segfault on bsddb.db.DB().type()  (2006-11-20)
       http://python.org/sf/1599782  closed by  nnorwitz

problem with socket.gethostname documentation  (2006-11-20)
       http://python.org/sf/1599879  closed by  nnorwitz

[PATCH] Quitting The Interpreter  (2006-11-21)
       http://python.org/sf/1600157  closed by  mwh

os.popen w/o using the shell  (2002-04-25)
       http://python.org/sf/548661  closed by  nnorwitz

memory leaks when importing posix module  (2002-09-23)
       http://python.org/sf/613222  closed by  nnorwitz

docs missing 'trace' module  (2003-07-29)
       http://python.org/sf/779976  closed by  nnorwitz

infinite __str__ recursion in thread causes seg fault  (2003-07-31)
       http://python.org/sf/780714  closed by  nnorwitz

python and lithuanian locales  (2003-11-02)
       http://python.org/sf/834452  closed by  nnorwitz

Bus error in extension with gcc 3.3  (2005-06-29)
       http://python.org/sf/1229788  closed by  nnorwitz

New / Reopened RFE
__________________

urllib(2) should allow automatic decoding by charset  (2006-11-19)
       http://python.org/sf/1599329  opened by  Erik Demaine


From steven.bethard at gmail.com  Thu Nov 23 07:48:44 2006
From: steven.bethard at gmail.com (Steven Bethard)
Date: Wed, 22 Nov 2006 23:48:44 -0700
Subject: [Python-Dev] DRAFT: python-dev summary for 2006-11-01 to 2006-11-15
Message-ID: <d11dcfba0611222248p3f872286u2edc45439a51a281@mail.gmail.com>

Here's the summary for the first half of November. Try not to spend it
all in one place! ;-)

As always, corrections and comments are greatly appreciated.


=============
Announcements
=============

--------------------------
Python 2.5 malloc families
--------------------------

Just a reminder that if you find your extension module is crashing
with Python 2.5 in malloc/free, there is a high chance that you have a
mismatch in malloc "families". Unlike previous versions, Python 2.5 no
longer allows sloppiness here -- if you allocate with the ``PyMem_*``
functions, you must free with the ``PyMem_*`` functions, and
similarly, if you allocate with the ``PyObject_*`` functions, you must
free with the ``PyObject_*`` functions.

Contributing thread:

- `2.5 portability problems
<http://mail.python.org/pipermail/python-dev/2006-November/069967.html>`__


=========
Summaries
=========

----------------------------------
Path algebra and related functions
----------------------------------

Mike Orr started work on a replacement for `PEP 355`_ that would
better group the path-related functions currently in ``os``,
``os.path``, ``shutil`` and other modules. He proposed to start with a
`directory-tuple Path class`_ that would have allowed code like::

    # equivalent to
    # os.path.join(os.path.dirname(os.path.dirname(__FILE__)), "lib")
    os.path.Path(__FILE__)[:-2] + "lib"

where a Path object would act like a tuple of directories, and could
be easily sliced and reordered as such.

As an alternative, glyph proposed using `Twisted's filepath module`_
which was already being used in a large body of code. He showed some
common pitfalls, like that the existence on Windows of "CON" and "NUL"
in *every* directory can make paths invalid, and indicated how
FilePath solved these problems.

Fredrik Lundh suggested a reorganization where functions that
manipulate path *names* would reside in ``os.path``, and functions
that manipulate *objects* identified by a path would reside in ``os``.
The ``os.path`` module would gain a path wrapper object, which would
allow "path algebra" manipulations, e.g. ``path1 + path2``. The ``os``
module would gain some of the ``os.path`` and ``shutil`` functions
that were manipulating real filesystem objects and not just the path
names. Most people seemed to like this approach, because it correctly
targeted the "algebraic" features at the areas where chained
operations were most common: path name operations, not filesystem
operations.

Some of the conversation moved on to the `Python 3000 list`_.

.. _PEP 355: http://www.python.org/dev/peps/pep-0355/
.. _directory-tuple Path class: http://wiki.python.org/moin/AlternativePathClass
.. _Twisted's filepath module:
http://twistedmatrix.com/trac/browser/trunk/twisted/python/filepath.py
.. _Python 3000 list: http://mail.python.org/mailman/listinfo/python-3000

Contributing threads:

- `Path object design
<http://mail.python.org/pipermail/python-dev/2006-November/069712.html>`__
- `Mini Path object
<http://mail.python.org/pipermail/python-dev/2006-November/069775.html>`__
- `[Python-3000] Mini Path object
<http://mail.python.org/pipermail/python-dev/2006-November/069776.html>`__

------------------
Replacing urlparse
------------------

A few more bugs in ``urlparse`` were turned up, and `earlier
discussions about replacing urlparse`_ were briefly revisited. Paul
Jimenez asked about `uriparse module`_ and was told that due to the
constant problems with ``urlparse``, people were concerned about
including the "incorrect" library again, so requirements were a little
stringent. Martin v. L?wis gave him some guidance on a few specific
points, and Nick Coghlan promised to try to post his `urischemes
module`_ (a derivative of Paul's `uriparse module`_) to the `Python
Package Index`_.

.. _earlier discussions about replacing urlparse:
http://www.python.org/dev/summary/2006-06-01_2006-06-15/#rfc-3986-uniform-resource-identifiers-uris
.. _uriparse module: http://bugs.python.org/1462525
.. _urischemes module: http://bugs.python.org/1500504
.. _Python Package Index: http://www.python.org/pypi

Contributing threads:

- `patch 1462525 or similar solution?
<http://mail.python.org/pipermail/python-dev/2006-November/069707.html>`__
- `Path object design
<http://mail.python.org/pipermail/python-dev/2006-November/069712.html>`__

----------------------------------
Importing .py, .pyc and .pyo files
----------------------------------

Martin v. L?wis brought up `Osvaldo Santana's patch`_ which would have
made Python search for both .pyc and .pyo files regardless of whether
or not the optimize flag, "-OO", was set (like zipimporter does).
Without this patch, when "-OO" was given, Python never looked for .pyc
files. Some people thought that an extra ``stat()`` call or directory
listing to check for the other file would be too expensive, but no one
profiled the various versions of the code so the cost was unclear.
People were leaning towards removing the extra functionality from
zipimporter so that at least it was consistent with the rest of
Python.

Giovanni Bajo suggested that .pyo file support should be dropped
completely, with .pyc files being compiled at various levels of
optimization depending on the command line flags. To make sure all
your .pyc files were compiled at the same level of optimization, you'd
use a new "-I" flag to indicate that all files should be recompiled,
e.g. ``python -I -OO app.py``.

Armin Rigo suggested only loading files with a .py extension. Python
would still generate .pyc files as a means of caching bytecode for
speed reasons, but it would never import them without a corresponding
.py file around. For people wanting to ship just bytecode, the cached
.pyc files could be renamed to .py files and then those could be
shipped and imported.

There was some support for Armin's solution, but it was not overwhelming.

.. _Osvaldo Santana's patch: http://bugs.python.org/1346572

Contributing thread:

- `Importing .pyc in -O mode and vice versa
<http://mail.python.org/pipermail/python-dev/2006-November/069822.html>`__

---------------------------------------------------------------
The buffer protocol and communicating binary format information
---------------------------------------------------------------

The discussion of extending the buffer protocol to more binary formats
continued this fortnight. Though the PIL_ had been used as an example
of a library that could benefit from an extended buffer protocol,
Fredrik Lundh indicated that future versions of the PIL_ would make
the binary data model completely opaque, and instead provide a
view-style API like::

    view = object.acquire_view(region, supported formats)
    ... access data in view ...
    view.release()

Along these lines, the discussion turned away from the particular C
formats used in ``ctypes``, ``numpy``, ``array``, etc. and more
towards the best way to communicate format information between these
modules. Though it seemed like people were not completely happy with
the proposed API of the new buffer protocol, the discussion seemed to
skirt around any concrete suggestions for better APIs.

In the end, the only thing that seemed certain was that a new buffer
protocol could only be successful if it were implemented on all of the
appropriate stdlib modules: ``ctypes``, ``array``, ``struct``, etc.

.. _PIL: http://www.pythonware.com/products/pil/

Contributing threads:

- `PEP: Adding data-type objects to Python
<http://mail.python.org/pipermail/python-dev/2006-November/069706.html>`__
- `PEP: Extending the buffer protocol to share array information.
<http://mail.python.org/pipermail/python-dev/2006-November/069709.html>`__
- `idea for data-type (data-format) PEP
<http://mail.python.org/pipermail/python-dev/2006-November/069733.html>`__

---------------
__dir__, part 2
---------------

Tomer Filiba continued his `previous investigations`_ into adding a
``__dir__()`` method to allow customization of the ``dir()`` builtin.
He moved most of the current ``dir()`` logic into
``object.__dir__()``, with some additional logic necessary for modules
and types being moved to ``ModuleType.__dir__()`` and
``type.__dir__()`` respectively. He posted a `patch for his
implementation`_ and it got approval for Python 2.6.

There was a brief discussion about whether or not it was okay for an
object to lie about its members, with Fredrik Lundh suggesting that
you should only be allowed to *add* to the result that ``dir()``
produces. Nick Coghlan pointed out that when a class overrides
``__getattribute__()``, attributes that the default ``dir()``
implementation sees can be blocked, in which case removing members
from the result of ``dir()`` might be quite appropriate.

.. _previous investigations:
http://www.python.org/dev/summary/2006-07-01_2006-07-15/#adding-a-dir-magic-method
.. _patch for his implementation: http://bugs.python.org/1591665

Contributing thread:

- `__dir__, part 2
<http://mail.python.org/pipermail/python-dev/2006-November/069865.html>`__

--------------------------------
Invalid read errors and valgrind
--------------------------------

Using valgrind, Herman Geza found that he was getting some "Invalid
read" read errors in PyObject_Free which weren't identified as
acceptable in Misc/README.valgrind. Tim Peters and Martin v. L?wis
explained that these are okay if they are reads from
Py_ADDRESS_IN_RANGE. If the address given is Python's own memory, a
valid arena index is read. Otherwise, garbage is read (though this
read will never fail since Python always reads from the page where the
about-to-be-freed block is located). The arenas are then checked to
see whether the result was garbage or not.

Neal Norwitz promised to try to update Misc/README.valgrind with this
information.

Contributing thread:

- `valgrind <http://mail.python.org/pipermail/python-dev/2006-November/069884.html>`__

---------------------------
SCons and cross-compilation
---------------------------

Martin v. L?wis reviewed a `patch for cross-compilation`_ which
proposed to use SCons_ instead of distutils because updating distutils
to work for cross-compilation would have involved some fairly major
changes. Distutils had certain notions of where to look for header
files and how to invoke the compiler which were incorrect for
cross-compilation, and which were difficult to change. While accepting
the patch would not have required SCons_ to be added to Python proper
(which a number of people opposed), people didn't like the idea of
having to update SCons configuration in addition to already having to
update setup.py, Modules/Setup and the PCbuild area. The patch was
therefore rejected.

.. _patch for cross-compilation: http://bugs.python.org/841454
.. _SCons: http://www.scons.org/

Contributing thread:

- `Using SCons for cross-compilation
<http://mail.python.org/pipermail/python-dev/2006-November/069917.html>`__

----------------------------
Individual interpreter locks
----------------------------

Robert asked about having a separate lock for each interpreter
instance instead of the global interpreter lock (GIL). Brett Cannon
and Martin v. L?wis explained that a variety of objects are shared
between interpreters, including:

* extension modules
* type objects (including exception types)
* singletons like ``None``, ``True``, ``()``, strings of length 1, etc.
* many things in the sys module

A single lock for each interpreter would not be sufficient for
handling access to such shared objects.

Contributing thread:

- `Feature Request: Py_NewInterpreter to create separate GIL (branch)
<http://mail.python.org/pipermail/python-dev/2006-November/069815.html>`__

---------------------------
Passing floats to file.seek
---------------------------

Python's implementation of ``file.seek`` was converting floats to
ints. `Robert Church suggested a patch`_ that would convert floats to
long longs and thus support files larger than 2GiB. Martin v. L?wis
proposed instead to use the ``__index__()`` API to support the large
files and to raise an exception for float arguments. Martin's approach
was approved, with a warning instead of an exception for Python 2.6.

.. _Robert Church suggested a patch: http://bugs.python.org/1067760

Contributing thread:

- `Passing floats to file.seek
<http://mail.python.org/pipermail/python-dev/2006-November/069948.html>`__

----------------------------------------
The datetime module and timezone objects
----------------------------------------

Fredrik Lundh asked about including a ``tzinfo`` object implementation
for the ``datetime`` module, along the lines of the ``UTC``,
``FixedOffset`` and ``LocalTimezone`` classes from the `library
reference`_. A number of people reported having copied those classes
into their own code repeatedly, and so Fredrik got the go-ahead to put
them into Python 2.6.

.. _library reference: http://docs.python.org/lib/datetime-tzinfo.html

Contributing thread:

- `ready-made timezones for the datetime module
<http://mail.python.org/pipermail/python-dev/2006-November/069951.html>`__


================
Deferred Threads
================
- `Summer of Code: zipfile?
<http://mail.python.org/pipermail/python-dev/2006-November/069955.html>`__
- `Results of the SOC projects
<http://mail.python.org/pipermail/python-dev/2006-November/069968.html>`__


==================
Previous Summaries
==================
- `The "lazy strings" patch [was: PATCH submitted: Speed up + for
string concatenation, now as fast as "".join(x) idiom]
<http://mail.python.org/pipermail/python-dev/2006-November/069816.html>`__


===============
Skipped Threads
===============
- `RELEASED Python 2.3.6, FINAL
<http://mail.python.org/pipermail/python-dev/2006-November/069752.html>`__
- `[Tracker-discuss] Getting Started
<http://mail.python.org/pipermail/python-dev/2006-November/069811.html>`__
- `Status of pairing_heap.py?
<http://mail.python.org/pipermail/python-dev/2006-November/069820.html>`__
- `Inconvenient filename in sandbox/decimal-c/new_dt
<http://mail.python.org/pipermail/python-dev/2006-November/069896.html>`__
- `test_ucn fails for trunk on x86 Ubuntu Edgy
<http://mail.python.org/pipermail/python-dev/2006-November/069904.html>`__
- `Weekly Python Patch/Bug Summary
<http://mail.python.org/pipermail/python-dev/2006-November/069909.html>`__
- `Last chance to join the Summer of PyPy!
<http://mail.python.org/pipermail/python-dev/2006-November/069913.html>`__
- `[Python-checkins] r52692 - in python/trunk: Lib/mailbox.py
Misc/NEWS <http://mail.python.org/pipermail/python-dev/2006-November/069919.html>`__
- `PyFAQ: help wanted with thread article
<http://mail.python.org/pipermail/python-dev/2006-November/069965.html>`__
- `Arlington sprint this Saturday
<http://mail.python.org/pipermail/python-dev/2006-November/069966.html>`__
- `Suggestion/ feature request
<http://mail.python.org/pipermail/python-dev/2006-November/069985.html>`__

From ncoghlan at gmail.com  Thu Nov 23 10:59:09 2006
From: ncoghlan at gmail.com (Nick Coghlan)
Date: Thu, 23 Nov 2006 19:59:09 +1000
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <4564C5E6.8070605@v.loewis.de>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>	<45636067.7040305@v.loewis.de>	<20061121215620.GA24206@code0.codespeak.net>
	<45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com>
	<4564C5E6.8070605@v.loewis.de>
Message-ID: <456570ED.8020706@gmail.com>

Martin v. L?wis wrote:
> Nick Coghlan schrieb:
>> Martin v. L?wis wrote:
>>> I personally consider it "good style" to rely on implementation details
>>> of CPython;
>> Is there a 'do not' missing somewhere in there?
> 
> No - I really mean it. I can find nothing wrong with people relying on
> reference counting to close files, for example. It's a property of
> CPython, and not guaranteed in other Python implementations - yet it
> works in a well-defined way in CPython. Code that relies on that feature
> is not portable, but portability is only one goal in software
> development, and may be irrelevant for some projects.

Cool, that's what I thought you meant (and it's a point I actually agree 
with). I was uncertain enough about your intent that I felt it was worth 
asking the question, though :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org

From arigo at tunes.org  Thu Nov 23 12:45:09 2006
From: arigo at tunes.org (Armin Rigo)
Date: Thu, 23 Nov 2006 12:45:09 +0100
Subject: [Python-Dev] DRAFT: python-dev summary for 2006-11-01 to
	2006-11-15
In-Reply-To: <d11dcfba0611222248p3f872286u2edc45439a51a281@mail.gmail.com>
References: <d11dcfba0611222248p3f872286u2edc45439a51a281@mail.gmail.com>
Message-ID: <20061123114509.GA7900@code0.codespeak.net>

Hi Steven,

On Wed, Nov 22, 2006 at 11:48:44PM -0700, Steven Bethard wrote:
> (... pyc files ...)
> For people wanting to ship just bytecode, the cached
> .pyc files could be renamed to .py files and then those could be
> shipped and imported.

Yuk!  Not renamed to .py files.  Distributing .py files that are
actually bytecode looks like a new funny way to create confusion.  No, I
was half-heartedly musing about introducing Yet Another file extension
(.pyc for caching and .pyX for importable bytecode, or possibly the
other way around).


A bientot,

Armin

From fredrik at pythonware.com  Thu Nov 23 18:06:45 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 23 Nov 2006 18:06:45 +0100
Subject: [Python-Dev] DRAFT: python-dev summary for 2006-11-01 to
	2006-11-15
In-Reply-To: <20061123114509.GA7900@code0.codespeak.net>
References: <d11dcfba0611222248p3f872286u2edc45439a51a281@mail.gmail.com>
	<20061123114509.GA7900@code0.codespeak.net>
Message-ID: <ek4kf6$ieh$1@sea.gmane.org>

Armin Rigo wrote:

> Yuk!  Not renamed to .py files.  Distributing .py files that are
> actually bytecode looks like a new funny way to create confusion.  No, I
> was half-heartedly musing about introducing Yet Another file extension
> (.pyc for caching and .pyX for importable bytecode, or possibly the
> other way around).

an alternative would be to only support source-less PYC import from ZIP 
archives (or other non-filesystem importers).

</F>


From theller at ctypes.org  Fri Nov 24 20:19:54 2006
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 24 Nov 2006 20:19:54 +0100
Subject: [Python-Dev] ctypes and powerpc
Message-ID: <456745DA.3010903@ctypes.org>

I'd like to ask for help with an issue which I do not know
how to solve.

Please see this bug http://python.org/sf/1563807
"ctypes built with GCC on AIX 5.3 fails with ld ffi error"

Apparently this is a powerpc machine, ctypes builds but cannot be imported
because of undefined symbols like 'ffi_call', 'ffi_prep_closure'.

These symbols are defined in file
  Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c.
The whole contents of this file is enclosed within a

#ifdef __ppc__
...
#endif

block.  IIRC, this block has been added by Ronald for the
Mac universal build.  Now, it seems that on the AIX machine
the __ppc__ symbols is not defined; removing the #ifdef/#endif
makes the built successful.

We have asked (in the SF bug tracker) for the symbols that are defined;
one guy has executed 'gcc -v -c empty.c' and posted the output, as far as I
see these are the symbols defined in gcc:

-D__GNUC__=2
-D__GNUC_MINOR__=9 -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43
-D_AIX51 -D_LONG_LONG -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43
-D_AIX51 -D_LONG_LONG -Asystem(unix) -Asystem(aix) -D__CHAR_UNSIGNED__
-D_ARCH_COM

What should we do now?  Should the conditional be changed to

#if defined(__ppc__) || defined(_POWER)

or should we suggest to add '-D__ppc__' to the CFLAGS env var, or what?
Any suggestions?

Thanks,
Thomas


From theller at ctypes.org  Fri Nov 24 20:59:41 2006
From: theller at ctypes.org (Thomas Heller)
Date: Fri, 24 Nov 2006 20:59:41 +0100
Subject: [Python-Dev] ctypes and powerpc
In-Reply-To: <456745DA.3010903@ctypes.org>
References: <456745DA.3010903@ctypes.org>
Message-ID: <45674F2D.7050203@ctypes.org>

Thomas Heller schrieb:
> I'd like to ask for help with an issue which I do not know
> how to solve.
> 
> Please see this bug http://python.org/sf/1563807
> "ctypes built with GCC on AIX 5.3 fails with ld ffi error"
> 
> Apparently this is a powerpc machine, ctypes builds but cannot be imported
> because of undefined symbols like 'ffi_call', 'ffi_prep_closure'.
> 
> These symbols are defined in file
>   Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c.
> The whole contents of this file is enclosed within a
> 
> #ifdef __ppc__
> ...
> #endif
> 
> block.  IIRC, this block has been added by Ronald for the
> Mac universal build.  Now, it seems that on the AIX machine
> the __ppc__ symbols is not defined; removing the #ifdef/#endif
> makes the built successful.

Of course, the simple solution would be to change it to:

#ifndef __i386__
...
#endif

Thomas


From martin at v.loewis.de  Sat Nov 25 08:23:21 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sat, 25 Nov 2006 08:23:21 +0100
Subject: [Python-Dev] ctypes and powerpc
In-Reply-To: <456745DA.3010903@ctypes.org>
References: <456745DA.3010903@ctypes.org>
Message-ID: <4567EF69.9080601@v.loewis.de>

Thomas Heller schrieb:
> What should we do now?  Should the conditional be changed to
> 
> #if defined(__ppc__) || defined(_POWER)
> 

This would be the right test, if you want to test for "power-pc
like". POWER and PowerPC are different processor architectures,
IBM pSeries machines (now System p) have POWER processors; this
is the predecessor of the PowerPC architecture (where PowerPC
omitted some POWER features, and added new ones). Recent POWER
processors (POWER3 and later, since 1997) are apparently
PowerPC-compatible. Still, AIX probably continues to define
_POWER for backwards-compatibility (back to RS/6000 times).

Regards,
Martin


From ronaldoussoren at mac.com  Sat Nov 25 08:24:07 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Fri, 24 Nov 2006 23:24:07 -0800
Subject: [Python-Dev] ctypes and powerpc
In-Reply-To: <456745DA.3010903@ctypes.org>
References: <456745DA.3010903@ctypes.org>
Message-ID: <B36F0C1C-010F-1000-A0D6-7FAA64990A80-Webmail-10020@mac.com>

 
On Friday, November 24, 2006, at 08:21PM, "Thomas Heller" <theller at ctypes.org> wrote:
>I'd like to ask for help with an issue which I do not know
>how to solve.
>
>Please see this bug http://python.org/sf/1563807
>"ctypes built with GCC on AIX 5.3 fails with ld ffi error"
>
>Apparently this is a powerpc machine, ctypes builds but cannot be imported
>because of undefined symbols like 'ffi_call', 'ffi_prep_closure'.
>
>These symbols are defined in file
>  Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c.
>The whole contents of this file is enclosed within a
>
>#ifdef __ppc__
>...
>#endif
>
>block.  IIRC, this block has been added by Ronald for the
>Mac universal build.  Now, it seems that on the AIX machine
>the __ppc__ symbols is not defined; removing the #ifdef/#endif
>makes the built successful.

The defines were indeed added for the universal build and I completely overlooked the fact that ffi_darwin.c is also used for AIX. One way to fix this is

#if !  (defined(__APPLE__) && !defined(__ppc__))
...
#endif

That is, compile the file unless __APPLE__ is defined but __ppc__ isn't. This more clearly documents the intent. 


>
>We have asked (in the SF bug tracker) for the symbols that are defined;
>one guy has executed 'gcc -v -c empty.c' and posted the output, as far as I
>see these are the symbols defined in gcc:
>
>-D__GNUC__=2
>-D__GNUC_MINOR__=9 -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43
>-D_AIX51 -D_LONG_LONG -D_IBMR2 -D_POWER -D_AIX -D_AIX32 -D_AIX41 -D_AIX43
>-D_AIX51 -D_LONG_LONG -Asystem(unix) -Asystem(aix) -D__CHAR_UNSIGNED__
>-D_ARCH_COM
>
>What should we do now?  Should the conditional be changed to
>
>#if defined(__ppc__) || defined(_POWER)
>
>or should we suggest to add '-D__ppc__' to the CFLAGS env var, or what?
>Any suggestions?
>
>Thanks,
>Thomas
>
>_______________________________________________
>Python-Dev mailing list
>Python-Dev at python.org
>http://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe: http://mail.python.org/mailman/options/python-dev/ronaldoussoren%40mac.com
>
>

From tomerfiliba at gmail.com  Sun Nov 26 16:40:52 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sun, 26 Nov 2006 17:40:52 +0200
Subject: [Python-Dev] infinities
Message-ID: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>

i found several places in my code where i use positive infinity
(posinf) for various things, i.e.,

    def readline(self, limit = -1):
        if limit < 0:
            limit = 1e10000 # posinf
        chars = []
        while limit > 0:
            ch = self.read(1)
            chars.append(ch)
            if not ch or ch == "\n":
                break
            limit -= 1
        return "".join(chars)

i like the concept, but i hate the "1e10000" stuff... why not add
posint, neginf, and nan to the float type? i find it much more readable as:

    if limit < 0:
        limit = float.posinf

posinf, neginf and nan are singletons, so there's no problem with
adding as members to the type.


-tomer

From bob at redivi.com  Sun Nov 26 16:52:24 2006
From: bob at redivi.com (Bob Ippolito)
Date: Sun, 26 Nov 2006 10:52:24 -0500
Subject: [Python-Dev] infinities
In-Reply-To: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>
References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>
Message-ID: <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>

On 11/26/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> i found several places in my code where i use positive infinity
> (posinf) for various things, i.e.,
>
>     def readline(self, limit = -1):
>         if limit < 0:
>             limit = 1e10000 # posinf
>         chars = []
>         while limit > 0:
>             ch = self.read(1)
>             chars.append(ch)
>             if not ch or ch == "\n":
>                 break
>             limit -= 1
>         return "".join(chars)
>
> i like the concept, but i hate the "1e10000" stuff... why not add
> posint, neginf, and nan to the float type? i find it much more readable as:
>
>     if limit < 0:
>         limit = float.posinf
>
> posinf, neginf and nan are singletons, so there's no problem with
> adding as members to the type.

sys.maxint makes more sense there. Or you could change it to "while
limit != 0" and set it to -1 (though I probably wouldn't actually do
that)...

There is already a PEP 754 for float constants, which is implemented
in the fpconst module (see CheeseShop). It's not (yet) part of the
stdlib though.

-bob

From tomerfiliba at gmail.com  Sun Nov 26 18:07:08 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sun, 26 Nov 2006 19:07:08 +0200
Subject: [Python-Dev] infinities
In-Reply-To: <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>
References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>
	<6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>
Message-ID: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com>

> sys.maxint makes more sense there.
no, it requires *infinity* to accomplish x - y == x; y != 0, for example:

while limit > 0:
    limit -= len(chunk)

with limit = posinf, the above code should be equivalent to "while True".

> There is already a PEP 754 for float constants
okay, that would suffice. but why isn't it part of stdlib already?
the pep is three years old... it should either be rejected or accepted.
meanwhile, there are lots of missing API functions in the floating-point
implementation...

besides, all the suggested APIs should be part of the float type, not
a separate module. here's what i want:

>>> f = 5.0
>>> f.is_infinity()
False
>>> float.PosInf
1.#INF


-tomer

On 11/26/06, Bob Ippolito <bob at redivi.com> wrote:
> On 11/26/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> > i found several places in my code where i use positive infinity
> > (posinf) for various things, i.e.,
> >
> >     def readline(self, limit = -1):
> >         if limit < 0:
> >             limit = 1e10000 # posinf
> >         chars = []
> >         while limit > 0:
> >             ch = self.read(1)
> >             chars.append(ch)
> >             if not ch or ch == "\n":
> >                 break
> >             limit -= 1
> >         return "".join(chars)
> >
> > i like the concept, but i hate the "1e10000" stuff... why not add
> > posint, neginf, and nan to the float type? i find it much more readable as:
> >
> >     if limit < 0:
> >         limit = float.posinf
> >
> > posinf, neginf and nan are singletons, so there's no problem with
> > adding as members to the type.
>
> sys.maxint makes more sense there. Or you could change it to "while
> limit != 0" and set it to -1 (though I probably wouldn't actually do
> that)...
>
> There is already a PEP 754 for float constants, which is implemented
> in the fpconst module (see CheeseShop). It's not (yet) part of the
> stdlib though.
>
> -bob
>

From fredrik at pythonware.com  Sun Nov 26 18:13:16 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 26 Nov 2006 18:13:16 +0100
Subject: [Python-Dev] infinities
In-Reply-To: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com>
References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>	<6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>
	<1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com>
Message-ID: <ekchvc$p6l$1@sea.gmane.org>

tomer filiba wrote:

> no, it requires *infinity* to accomplish x - y == x; y != 0, for example:
> 
> while limit > 0:
>     limit -= len(chunk)
> 
> with limit = posinf, the above code should be equivalent to "while True".

that's a remarkably stupid way to count bytes.  if you want to argue for 
additions to the language, you could at least bother to come up with a 
sane use case.

</F>


From pje at telecommunity.com  Sun Nov 26 18:59:13 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 26 Nov 2006 12:59:13 -0500
Subject: [Python-Dev] infinities
In-Reply-To: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.co
 m>
References: <6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>
	<1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>
	<6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>
Message-ID: <5.1.1.6.0.20061126125541.027f7198@sparrow.telecommunity.com>

At 07:07 PM 11/26/2006 +0200, tomer filiba wrote:
> > sys.maxint makes more sense there.
>no, it requires *infinity* to accomplish x - y == x; y != 0, for example:
>
>while limit > 0:
>     limit -= len(chunk)

Um, you do realize that you're not going to be able to fit sys.maxint 
strings into a list, right?  That's over 2 billion *pointers* worth of 
memory, so at least 8 gigabytes on a 32-bit machine...  that probably can't 
address more than 4 gigabytes of memory to start with.  The code will fail 
with MemoryError long before you exhaust sys.maxint, even in the case where 
you're using only 1-character strings.


From julvar at tamu.edu  Tue Nov 21 07:29:45 2006
From: julvar at tamu.edu (Julian)
Date: Tue, 21 Nov 2006 00:29:45 -0600
Subject: [Python-Dev] Suggestion/ feature request
In-Reply-To: <45629636.6090407@v.loewis.de>
Message-ID: <009201c70d36$714dd5f0$24b75ba5@aero.ad.tamu.edu>

> -----Original Message-----
> From: "Martin v. L?wis" [mailto:martin at v.loewis.de] 
> Sent: Tuesday, November 21, 2006 12:01 AM
> To: Julian
> Cc: python-dev at python.org
> Subject: Re: [Python-Dev] Suggestion/ feature request
> 
> Julian schrieb:
> > I am using python with swig and I get a lot of macro redefinition 
> > warnings like so:
> > warning C4005: '_CRT_SECURE_NO_DEPRECATE' : macro redefinition
> > 
> > In the file - pyconfig.h - rather than the following lines, I was 
> > wondering if it would be more reasonable to use #ifdef 
> statements as 
> > shown in the bottom of the email...
> 
> While I agree that would be reasonable, I also wonder why you 
> are getting these errors. Where is the first definition of 
> these macros, and how is the macro defined at the first definition?
> 
> Regards,
> Martin

In my specific case, the order of the definitions in any wrapper file
created by SWIG (I am using Version 1.3.30) looks like this:
//example_wrap.cxx
//snipped code
/* Deal with Microsoft's attempt at deprecating C standard runtime functions
*/
#if !defined(SWIG_NO_CRT_SECURE_NO_DEPRECATE) && defined(_MSC_VER) &&
!defined(_CRT_SECURE_NO_DEPRECATE)
# define _CRT_SECURE_NO_DEPRECATE
#endif


/* Python.h has to appear first */
#include <Python.h>

//snipped code

SWIG seems to have done it properly by checking to see if it has been
defined already (which, I think, is how python should do it as well)
Now, even if I am not using SWIG, I could imagine these being defined
elsewhere (by other headers/libraries) or even by setting them in the VS2005
IDE project settings (which I actually do sometimes). While these are *just*
warnings and not errors, it would look cleaner if pyconfig.h would check if
they were defined already.

Julian.


From julvar at tamu.edu  Tue Nov 21 20:13:09 2006
From: julvar at tamu.edu (Julian)
Date: Tue, 21 Nov 2006 13:13:09 -0600
Subject: [Python-Dev] Suggestion/ feature request
In-Reply-To: <456343E3.4000203@v.loewis.de>
Message-ID: <000001c70da1$15260f20$24b75ba5@aero.ad.tamu.edu>

 
> -----Original Message-----
> From: "Martin v. L?wis" [mailto:martin at v.loewis.de] 
> Sent: Tuesday, November 21, 2006 12:22 PM
> To: Julian
> Cc: python-dev at python.org
> Subject: Re: [Python-Dev] Suggestion/ feature request
> 
> Julian schrieb:
> > SWIG seems to have done it properly by checking to see if 
> it has been 
> > defined already (which, I think, is how python should do it 
> as well) 
> > Now, even if I am not using SWIG, I could imagine these 
> being defined 
> > elsewhere (by other headers/libraries) or even by setting 
> them in the 
> > VS2005 IDE project settings (which I actually do sometimes). While 
> > these are *just* warnings and not errors, it would look cleaner if 
> > pyconfig.h would check if they were defined already.
> 
> Sure; I have fixed this now in r52817 and r52818
> 
> I just wondered why you get the warning: you shouldn't get 
> one if the redefinition is the same as the original one. In 
> this case, it wasn't the same redefinition, as SWIG was 
> merely defining them, and Python was defining them to 1.
> 
> Regards,
> Martin
> 

Thanks! you are right... I didn't know that !
I have two questions though... Is there any reason why Python is defining
them to 1?
In pyconfig.h, there is:

#ifndef _CRT_SECURE_NO_DEPRECATE
#define _CRT_SECURE_NO_DEPRECATE 1
#endif

And then later on in the same file:
/* Turn off warnings about deprecated C runtime functions in 
   VisualStudio .NET 2005 */
#if _MSC_VER >= 1400 && !defined _CRT_SECURE_NO_DEPRECATE
#define _CRT_SECURE_NO_DEPRECATE
#endif

Isn't that redundant? I don't think that second block will ever get
executed. Moreover, in the second block, it is not being defined to 1. why
is that ?

Julian.


From imurdock at imurdock.com  Wed Nov 22 17:09:35 2006
From: imurdock at imurdock.com (Ian Murdock)
Date: Wed, 22 Nov 2006 11:09:35 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <d75d8d530611220809v2ed1261dn49088ba97f11d1b@mail.gmail.com>

Hi everyone,

Guido van Rossum suggested I send this email here.

I'm CTO of the Free Standards Group and chair of the Linux Standard
Base (LSB), the interoperability standard for the Linux distributions.
We're wanting to add Python to the next version of the LSB (LSB 3.2) [1]
and are looking for someone (or, better, a few folks) in the Python
community to help us lead the effort to do that. The basic goal
is to standardize the Python environment compliant Linux distributions
(Red Hat, SUSE, Debian, Ubuntu, etc.) provide so that
application developers can write LSB compliant applications in Python.

[1] http://www.freestandards.org/en/LSB_Roadmap

The first question we have to answer is: What does it mean to "add
Python to the LSB"? Is it enough to say that Python is present
at a certain version and above, or do we need to do more than that
(e.g., many distros ship numerous Python add-ons which apps
may or may not rely on--do we need to specific some of these too)? What
would be the least common denominator version? Answering this question
will require us to look at the major Linux distros (RHEL,
SLES, Debian, Ubuntu, etc.) to see what versions they ship. And so on.

Once we've decided how best to specify that Python is present, how
do we test that it is indeed present? Of course, there's the existing
Python test suites, so there shouldn't be a lot of work to do here.

Another question is how to handle binary modules. The LSB provides strict
backward compatibility at the binary level, even across major versions, and
that may or may not be appropriate for Python. The LSB is mostly concerned
with backward compatibility from an application developer's point of view,
and this would seem to mean largely 100% Python, whereas C extensions would
seem to be largely the domain of component developers, such as Python
access to Gtk or other OS services (here, we'd probably look to add those
components to the LSB directly rather than specifying the Python ABI so
they can be maintained separately). Of course I could be wrong about this.

Anyway, as you can see, there are numerous issues to work out here. If
anyone is interested in getting involved, please drop me a line, and I'd be
happy to answer any questions (discussion on any of the topics above would
be welcomed as well). Finally, for any Python developers in and
around Berlin, the LSB is holding its next face to face meeting in Berlin
December 4-6, where the LSB 3.2 roadmap will be finalized. If you could
find some time to stop by and talk with us, we would deeply appreciate it:

http://www.freestandards.org/en/LSB_face-to-face_%28December_2006%29

Thanks,

-ian
-- 
Ian Murdock
317-863-2590
http://ianmurdock.com/

"Don't look back--something might be gaining on you." --Satchel Paige

From cfarwell at mac.com  Sun Nov 26 13:45:03 2006
From: cfarwell at mac.com (Chris Farwell)
Date: Sun, 26 Nov 2006 12:45:03 +0000
Subject: [Python-Dev] (no subject)
Message-ID: <3471ECA0-BDC2-4830-8901-4E274F0EF802@mac.com>

Mr. Rossum,
I saw an old post you made about the Google Internships (Jan  
25,2006). As a prospective for next summer, you mention that it would  
be in my best interest to contact brett Cannon. I have many questions  
I'd love to have answered, how do I go about contacting him? I look  
forward to your reply.

Chris Farwell
cfarwell at mac.com


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061126/03848bea/attachment-0001.html 

From martin at v.loewis.de  Sun Nov 26 19:48:29 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 26 Nov 2006 19:48:29 +0100
Subject: [Python-Dev] infinities
In-Reply-To: <1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com>
References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>	<6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>
	<1d85506f0611260907j7cadf216md2a69be2b1ebc21c@mail.gmail.com>
Message-ID: <4569E17D.3070608@v.loewis.de>

tomer filiba schrieb:
> okay, that would suffice. but why isn't it part of stdlib already?
> the pep is three years old... it should either be rejected or accepted.
> meanwhile, there are lots of missing API functions in the floating-point
> implementation...

It's not rejected because people keep requesting the feature, and not
accepted because it's not implementable in general (i.e. it is difficult
to implement on platforms where the double type is not IEEE-754).

Regards,
Martin

From martin at v.loewis.de  Sun Nov 26 20:10:12 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Sun, 26 Nov 2006 20:10:12 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <d75d8d530611220809v2ed1261dn49088ba97f11d1b@mail.gmail.com>
References: <d75d8d530611220809v2ed1261dn49088ba97f11d1b@mail.gmail.com>
Message-ID: <4569E694.7040901@v.loewis.de>

Ian Murdock schrieb:
> I'm CTO of the Free Standards Group and chair of the Linux Standard
> Base (LSB), the interoperability standard for the Linux distributions.
> We're wanting to add Python to the next version of the LSB (LSB 3.2) [1]
> and are looking for someone (or, better, a few folks) in the Python
> community to help us lead the effort to do that. The basic goal
> is to standardize the Python environment compliant Linux distributions
> (Red Hat, SUSE, Debian, Ubuntu, etc.) provide so that
> application developers can write LSB compliant applications in Python.

I wrote to Ian that I would be interested; participating in the meeting
in Berlin is quite convenient. I can try to keep python-dev updated.

Regards,
Martin

From aahz at pythoncraft.com  Sun Nov 26 20:20:27 2006
From: aahz at pythoncraft.com (Aahz)
Date: Sun, 26 Nov 2006 11:20:27 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <4569E694.7040901@v.loewis.de>
References: <d75d8d530611220809v2ed1261dn49088ba97f11d1b@mail.gmail.com>
	<4569E694.7040901@v.loewis.de>
Message-ID: <20061126192026.GA5909@panix.com>

On Sun, Nov 26, 2006, "Martin v. L?wis" wrote:
>
> I wrote to Ian that I would be interested; participating in the meeting
> in Berlin is quite convenient. I can try to keep python-dev updated.

Please do -- it's not something I have a lot of cycles for but am
interested in.
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Usenet is not a democracy.  It is a weird cross between an anarchy and a
dictatorship.

From pje at telecommunity.com  Sun Nov 26 20:41:16 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 26 Nov 2006 14:41:16 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <d75d8d530611220809v2ed1261dn49088ba97f11d1b@mail.gmail.com
 >
Message-ID: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>

At 11:09 AM 11/22/2006 -0500, Ian Murdock wrote:
>The first question we have to answer is: What does it mean to "add
>Python to the LSB"? Is it enough to say that Python is present
>at a certain version and above, or do we need to do more than that
>(e.g., many distros ship numerous Python add-ons which apps
>may or may not rely on--do we need to specific some of these too)?

Just a suggestion, but one issue that I think needs addressing is the FHS 
language that leads some Linux distros to believe that they should change 
Python's normal installation layout (sometimes in bizarre ways) or that 
they should remove and separately package different portions of the 
standard library.  Other vendors apparently also patch Python in various 
ways to support their FHS-based theories of how Python should install 
files.  These changes are detrimental to compatibility.

Another issue is specifying dependencies.  The existence of the Cheeseshop 
as a central registry of Python project names has not been taken into 
account in vendor packaging practices, for example.  (Python 2.5 also 
introduced the ability to install metadata alongside installed Python 
packages, supporting runtime checking for package presence and versions.)

I don't know how closely these issues tie into what the LSB is tying to do, 
as I've only observed these issues in the breach, where certain 
distribution policies require e.g. that project names be replaced with 
internal package names, demand separation of package data files from their 
packages, or other procrustean chopping that makes mincemeat of any attempt 
at multi-distribution compatibility for an application or multi-dependency 
library.  Some clarification at the LSB level of what is actually 
considered standard for Python might perhaps be helpful in motivating 
updates to some of these policies.


From tomerfiliba at gmail.com  Sun Nov 26 20:57:08 2006
From: tomerfiliba at gmail.com (tomer filiba)
Date: Sun, 26 Nov 2006 21:57:08 +0200
Subject: [Python-Dev] infinities
In-Reply-To: <5.1.1.6.0.20061126125541.027f7198@sparrow.telecommunity.com>
References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>
	<6a36e7290611260752w6dc208c0nd2310e7cee0114fd@mail.gmail.com>
	<5.1.1.6.0.20061126125541.027f7198@sparrow.telecommunity.com>
Message-ID: <1d85506f0611261157m15bcf761vdc5e1f57960f19f8@mail.gmail.com>

> Um, you do realize that you're not going to be able to fit sys.maxint
> strings into a list, right?
i can multiply by four, thank you. of course i don't expect anyone to read
a string *that* long.

besides, this *particular example* isn't important, it was just meant to
show why someone might want to use it. why are people being so picky
about the details of an example code?

first of all, a "while True" loop is not limited by sys.maxint, so i see no
reason why i couldn't get the same result by subtracting from infinity.
that may seem blunt, but it's a good way have the same code handle
both cases (limited and unlimited reading).

all i was asking for was a better way to express and handle infinity
(and nan), instead of the poor-man's version of "nan = 2e2222/3e3333".
float.posinf or float.isinf(5.0) seem the right way to me.

for some reference, it seemed the right way to other people too:
http://msdn2.microsoft.com/en-gb/library/system.double_methods.aspx
http://java.sun.com/j2se/1.4.2/docs/api/java/lang/Float.html

the third-party fp module is nice, but it ought to be part of the float type,
or at least part of stdlib.

- - - - - -

if it were up to me, *literals* producing infinity would be a syntax error
(of course i would allow computations to result in infinity).

for the reason why, consider this:
>>> 1e11111 == 2e22222
True


-tomer

From talin at acm.org  Sun Nov 26 21:24:03 2006
From: talin at acm.org (Talin)
Date: Sun, 26 Nov 2006 12:24:03 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
Message-ID: <4569F7E3.9040004@acm.org>

I've been looking once again over the docs for distutils and setuptools, 
and thinking to myself "this seems a lot more complicated than it ought 
to be".

Before I get into detail, however, I want to explain carefully the scope 
of my critique - in particular, why I am talking about setuptools on the 
python-dev list. You see, in my mind, the process of assembling, 
distributing, and downloading a package is, or at least ought to be, a 
unified process. It ought to be a fundamental part of the system, and 
not split into separate tools with separate docs that have to be 
mentally assembled in order to understand it.

Moreover, setuptools is the defacto standard these days - a novice 
programmer who googles for 'python install tools' will encounter 
setuptools long before they learn about distutils; and if you read the 
various mailing lists and blogs, you'll sense a subtle aura of 
deprecation and decay that surrounds distutils.

I would claim, then, that regardless of whether setuptools is officially 
blessed or not, it is an intrinstic part of the "Python experience".

(I'd also like to put forward the disclaimer that there are probably 
factual errors in this post, or errors of misunderstanding; All I can 
claim as an excuse is that it's not for lack of trying, and corrections 
are welcome as always.)

Think about the idea of module distribution from a pedagogical 
standpoint - when does a newbie Python programmer start learning about 
module distribution and what do they learn first? A novice Python user 
will begin by writing scripts for themselves, and not thinking about 
distribution at all. However, once they reach the point where they begin 
to think about packaging up their module, the Python documentation ought 
to be able to lead them, step by step, towards a goal of making a 
distributable package:

  -- It should teach them how to organize their code into packages and 
modules
  -- It should show them how to write the proper setup scripts
  -- If there is C code involved, it should explain how that fits into 
the picture.
  -- It should explain how to write unit tests and where they should go.

So how does the current system fail in this regard? The docs for each 
component - distutils, setuptools, unit test frameworks, and so on, only 
talk about that specific module - not how it all fits together.

For example, the docs for distutils start by telling you how to build a 
setup script. It never explains why you need a setup script, or why 
Python programs need to be "installed" in the first place. [1]

The distutils docs never describe how your directory structure ought to 
look. In fact, they never tell you how to *write* a distributable 
package; rather, it seems to be more oriented towards taking an 
already-working package and modifying it to be distributable.

The setuptools docs are even worse in this regard. If you look carefully 
at the docs for setuptools, you'll notice that each subsection is 
effectively a 'diff', describing how setuputils is different from 
distutils. One section talks about the "new and changed keywords", 
without explaining what the old keywords were or how to find them.

Thus, for the novice programmer, learning how to write a setup script 
ends up being a process of flipping back and forth between the distutils 
and setuptools docs, trying to hold in their minds enough of each to be 
able to achieve some sort of understanding.

What we have now does a good job of explaining how the individual tools 
work, but it doesn't do a good job of answering the question "Starting 
from an empty directory, how do I create a distributable Python 
package?" A novice programmer wants to know what to create first, what 
to create next, and so on.

This is especially true if the novice programmer is creating an 
extension module. Suppose I have a C library that I need to wrap. In 
order to even compile and test it, I'm going to need a setup script. 
That means I need to understand distutils before I even think about 
distribution, before I even begin writing the code!

(Sure, I could write a Makefile, but I'd only end up throwing it away 
later -- so why not cut to the chase and *start* with a setup script? 
Ans: Because it's too hard!)

But it isn't just the docs that are at fault here - otherwise, I'd be 
posting this on a different mailing list. It seems like the whole 
architecture is 'diff'-based, a series of patches on top of patches, 
which are in need of some serious refactoring.

Except that nobody can do this refactoring, because there's no formal 
list of requirements. I look at distutils, and while some parts are 
obvious, there are other parts where I go "what problem were they trying 
to solve here?" In my experience, you *don't* go mucking with someone's 
code and trying to fix it unless you understand what problem they were 
trying to solve - otherwise you'll botch it and make a mess. Since few 
people ever bother to write down what problem they were trying to solve 
(although they tend to be better at describing their clever solution), 
usually this ends up being done through a process of reverse engineering 
the requirements from the code, unless you are lucky enough to have 
someone around who knows the history of the thing.

Admittedly, I'm somewhat in ignorance here. My perspective is that of an 
'end-user developer', someone who uses these tools but does not write 
them. I don't know the internals of these tools, nor do I particularly 
want to - I've got bigger fish to fry.

I'm posting this here because what I'd like folks to think about is the 
whole process of Python development, not just the documentation. What is 
the smoothest path from empty directory to a finished package on PyPI? 
What can be changed about the current standard libraries that will ease 
this process?

[1] The answer, AFAICT, is that 'setup' is really a Makefile - in other 
words, its a platform-independent way of describing how to  construct a 
compiled module from sources, and making it available to all programs on 
that system. Although this gets confusing when we start talking about 
"pure python" modules that have no C component - because we have all 
this language that talks about compiling and installing and such, when 
all that is really going on underneath is a plain old file copy.

-- Talin

From fredrik at pythonware.com  Sun Nov 26 21:35:03 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Sun, 26 Nov 2006 21:35:03 +0100
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <4569F7E3.9040004@acm.org>
References: <4569F7E3.9040004@acm.org>
Message-ID: <ekctpn$vdt$1@sea.gmane.org>

Talin wrote:

> But it isn't just the docs that are at fault here - otherwise, I'd be 
> posting this on a different mailing list. It seems like the whole 
> architecture is 'diff'-based, a series of patches on top of patches, 
> which are in need of some serious refactoring.

so to summarize, you want someone to rewrite the code and write new 
documentation, and since you didn't even have time to make your post 
shorter, that someone will obviously not be you ?

</F>


From talin at acm.org  Sun Nov 26 21:48:05 2006
From: talin at acm.org (Talin)
Date: Sun, 26 Nov 2006 12:48:05 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <ekctpn$vdt$1@sea.gmane.org>
References: <4569F7E3.9040004@acm.org> <ekctpn$vdt$1@sea.gmane.org>
Message-ID: <4569FD85.4010006@acm.org>

Fredrik Lundh wrote:
> Talin wrote:
> 
>> But it isn't just the docs that are at fault here - otherwise, I'd be 
>> posting this on a different mailing list. It seems like the whole 
>> architecture is 'diff'-based, a series of patches on top of patches, 
>> which are in need of some serious refactoring.
> 
> so to summarize, you want someone to rewrite the code and write new 
> documentation, and since you didn't even have time to make your post 
> shorter, that someone will obviously not be you ?

Oh, it was a lot longer when I started :)

As far as rewriting it goes - I can only rewrite things that I understand.

> </F>


From sluggoster at gmail.com  Sun Nov 26 22:21:55 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Sun, 26 Nov 2006 13:21:55 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <4569F7E3.9040004@acm.org>
References: <4569F7E3.9040004@acm.org>
Message-ID: <6e9196d20611261321m5142989yef4c9180ebc9427e@mail.gmail.com>

On 11/26/06, Talin <talin at acm.org> wrote:
> I've been looking once again over the docs for distutils and setuptools,
> and thinking to myself "this seems a lot more complicated than it ought
> to be".
>
> Before I get into detail, however, I want to explain carefully the scope
> of my critique - in particular, why I am talking about setuptools on the
> python-dev list. You see, in my mind, the process of assembling,
> distributing, and downloading a package is, or at least ought to be, a
> unified process. It ought to be a fundamental part of the system, and
> not split into separate tools with separate docs that have to be
> mentally assembled in order to understand it.
>
> Moreover, setuptools is the defacto standard these days - a novice
> programmer who googles for 'python install tools' will encounter
> setuptools long before they learn about distutils; and if you read the
> various mailing lists and blogs, you'll sense a subtle aura of
> deprecation and decay that surrounds distutils.

Look at the current situation as more of an evoluntionary point than a
finished product. There's widespread support for integrating
setuptools into Python as you suggest. I've heard it discussed at
Pycon the past two years. The reason it hasn't been done yet is
technical, from what I've heard. Distutils is apparently difficult to
patch correctly and could stand a rewrite.

I'm currently studying the Pylons implementation and thus having to
learn more about entry points, resources, ini files used by eggs, etc.
 This requires studying three different pages on the
peak.telecommunity.com site -- exactly the problem you're describing.

A comprehensive third-party manual that integrates the documentation
would be a good place to start. Even the outline of such a manual
would be a good. That would give a common baseline of understanding
for package users, package developers, and core developers.  I wonder
if one of the Python books already has this written down somewhere.

>From the manual one could then distill a spec for "what's needed in a
package manager, what features a distutils upgrade would provide, and
what a package should/may contain".  That would be a basis for one or
more PEPs.

The "diff" approach is understandable at the beginning, because that's
how the developers think of it, and how most users will approach it
initially. We also needed real-world experience to see if the
setuptools approach was even feasable large-scale or whether it needed
major changes.  Now we have more experience, and more Pythoneers are
appearing who are unfamiliar with the "distutils-only" approach.  So
requests like Talin's will become more frequent.

It's such a big job and Python 2.6 is slated as "minimal features"
release, so it may be better to target this for Python 3 and backport
it if possible.

-- 
Mike Orr <sluggoster at gmail.com>

From pje at telecommunity.com  Sun Nov 26 23:36:27 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Sun, 26 Nov 2006 17:36:27 -0500
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <6e9196d20611261321m5142989yef4c9180ebc9427e@mail.gmail.com
 >
References: <4569F7E3.9040004@acm.org>
 <4569F7E3.9040004@acm.org>
Message-ID: <5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com>

At 01:21 PM 11/26/2006 -0800, Mike Orr wrote:
>A comprehensive third-party manual that integrates the documentation
>would be a good place to start. Even the outline of such a manual
>would be a good. That would give a common baseline of understanding
>for package users, package developers, and core developers.

A number of people have written quick-start or how-to guides for 
setuptools, although I haven't been keeping track.

I have noticed, however, that a signficant number of help requests for 
setuptools can be answered by internal links to one of its manuals -- and 
when a topic comes up that isn't in the manual, I usually add it.

The "diff" issue is certainly there, of course, as is the fact that there 
are multiple manuals.  However, I don't think the answer is fewer manuals, 
in fact it's likely to be having *more*.  What exists right now is a 
developer's guide and reference for setuptools, a reference for the 
pkg_resources API, and an all-purpose handbook for easy_install.  Each of 
these could use beginner's introductions or tutorials that are deliberately 
short on details, but which provide links to the relevant sections of the 
comprehensive manuals.

My emphasis on the existing manuals was aimed at early adopters, who were 
likely to be familiar with at least some of distutils' hazards and 
difficulties, and thus would learn most quickly (and be most motivated) by 
seeing what was different.  Obviously, nearly everybody in that camp has 
either already switched or decided they're not switching due to investment 
in other distutils-wrapping technologies and/or incompatible 
philosophies.  So, the manuals are no longer adequate for the next wave of 
developers.

Anyway, I would be happy to link from the manuals and Cheeseshop page to 
quality tutorials that focus on one or more aspects of developing, 
packaging, or distributing Python projects using setuptools.


From guido at python.org  Mon Nov 27 01:07:59 2006
From: guido at python.org (Guido van Rossum)
Date: Sun, 26 Nov 2006 16:07:59 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <4569E694.7040901@v.loewis.de>
References: <d75d8d530611220809v2ed1261dn49088ba97f11d1b@mail.gmail.com>
	<4569E694.7040901@v.loewis.de>
Message-ID: <ca471dc20611261607o348b25f0yc4fcbe814a389e01@mail.gmail.com>

Excellent! Like Aahz, I have no cycles, but I think it's  a worthy goal.

--Guido

On 11/26/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Ian Murdock schrieb:
> > I'm CTO of the Free Standards Group and chair of the Linux Standard
> > Base (LSB), the interoperability standard for the Linux distributions.
> > We're wanting to add Python to the next version of the LSB (LSB 3.2) [1]
> > and are looking for someone (or, better, a few folks) in the Python
> > community to help us lead the effort to do that. The basic goal
> > is to standardize the Python environment compliant Linux distributions
> > (Red Hat, SUSE, Debian, Ubuntu, etc.) provide so that
> > application developers can write LSB compliant applications in Python.
>
> I wrote to Ian that I would be interested; participating in the meeting
> in Berlin is quite convenient. I can try to keep python-dev updated.
>
> Regards,
> Martin
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From sluggoster at gmail.com  Mon Nov 27 04:05:06 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Sun, 26 Nov 2006 19:05:06 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com>
References: <4569F7E3.9040004@acm.org>
	<5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com>
Message-ID: <6e9196d20611261905u2939eacal3313b2d3d192420f@mail.gmail.com>

On 11/26/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> I have noticed, however, that a signficant number of help requests for
> setuptools can be answered by internal links to one of its manuals -- and
> when a topic comes up that isn't in the manual, I usually add it.

Hmm, I may have a couple topics for you after I check my notes.

> The "diff" issue is certainly there, of course, as is the fact that there
> are multiple manuals.  However, I don't think the answer is fewer manuals,
> in fact it's likely to be having *more*.  What exists right now is a
> developer's guide and reference for setuptools, a reference for the
> pkg_resources API, and an all-purpose handbook for easy_install.  Each of
> these could use beginner's introductions or tutorials that are deliberately
> short on details, but which provide links to the relevant sections of the
> comprehensive manuals.

I could see a comprehensive manual running forty pages, and most
readers only caring about a small fraction of it.  So you have a
point.  Maybe more impotant than one book is having "one place to go",
a TOC of articles that are all independent yet written to complement
each other.

But Talin's point is still valid.  Users have questions like, "How do
I structure my package so it takes advantage of all the gee-whiz
cheeseshop features?  Where do I put my tests?  Should I use unittest,
py.test, or nose?  How will users see my README and my docs if they
easy_install my package?  What are all those files in the EGG-INFO
directory?  What's that word 'distribution' in some of the function
signatures? How do I use entry points, they look pretty complicated?"
Some of these questions are multi-tool or are outside the scope of
setuptools; some span both the Peak docs and the Python docs.  People
need an answer that starts with their question, rather than an answer
that's a section in a manual describing a particular tool.

-- 
Mike Orr <sluggoster at gmail.com>

From talin at acm.org  Mon Nov 27 04:11:18 2006
From: talin at acm.org (Talin)
Date: Sun, 26 Nov 2006 19:11:18 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <6e9196d20611261905u2939eacal3313b2d3d192420f@mail.gmail.com>
References: <4569F7E3.9040004@acm.org>	<5.1.1.6.0.20061126172911.03ef19b0@sparrow.telecommunity.com>
	<6e9196d20611261905u2939eacal3313b2d3d192420f@mail.gmail.com>
Message-ID: <456A5756.80406@acm.org>

Mike Orr wrote:
> On 11/26/06, Phillip J. Eby <pje at telecommunity.com> wrote:
>> I have noticed, however, that a signficant number of help requests for
>> setuptools can be answered by internal links to one of its manuals -- and
>> when a topic comes up that isn't in the manual, I usually add it.
> 
> Hmm, I may have a couple topics for you after I check my notes.
> 
>> The "diff" issue is certainly there, of course, as is the fact that there
>> are multiple manuals.  However, I don't think the answer is fewer manuals,
>> in fact it's likely to be having *more*.  What exists right now is a
>> developer's guide and reference for setuptools, a reference for the
>> pkg_resources API, and an all-purpose handbook for easy_install.  Each of
>> these could use beginner's introductions or tutorials that are deliberately
>> short on details, but which provide links to the relevant sections of the
>> comprehensive manuals.
> 
> I could see a comprehensive manual running forty pages, and most
> readers only caring about a small fraction of it.  So you have a
> point.  Maybe more impotant than one book is having "one place to go",
> a TOC of articles that are all independent yet written to complement
> each other.
> 
> But Talin's point is still valid.  Users have questions like, "How do
> I structure my package so it takes advantage of all the gee-whiz
> cheeseshop features?  Where do I put my tests?  Should I use unittest,
> py.test, or nose?  How will users see my README and my docs if they
> easy_install my package?  What are all those files in the EGG-INFO
> directory?  What's that word 'distribution' in some of the function
> signatures? How do I use entry points, they look pretty complicated?"
> Some of these questions are multi-tool or are outside the scope of
> setuptools; some span both the Peak docs and the Python docs.  People
> need an answer that starts with their question, rather than an answer
> that's a section in a manual describing a particular tool.

You said it way better than I did - I feel totally validated now :)

-- Talin

From rhamph at gmail.com  Mon Nov 27 10:17:10 2006
From: rhamph at gmail.com (Adam Olsen)
Date: Mon, 27 Nov 2006 02:17:10 -0700
Subject: [Python-Dev] infinities
In-Reply-To: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>
References: <1d85506f0611260740l5fcc3222q74a33b34ee5a7c3b@mail.gmail.com>
Message-ID: <aac2c7cb0611270117g7d4ff610ue735ab3d297c4944@mail.gmail.com>

On 11/26/06, tomer filiba <tomerfiliba at gmail.com> wrote:
> i found several places in my code where i use positive infinity
> (posinf) for various things, i.e.,
>
>
> i like the concept, but i hate the "1e10000" stuff... why not add
> posint, neginf, and nan to the float type? i find it much more readable as:
>
>     if limit < 0:
>         limit = float.posinf
>
> posinf, neginf and nan are singletons, so there's no problem with
> adding as members to the type.

There's no reason this has to be part of the float type.  Just define
your own PosInf/NegInf singletons and PosInfType/NegInfType classes,
giving them the appropriate special methods.

NaN is a bit iffier, but in your case it's sufficient to raise an
exception whenever it would be created.

Consider submitting it to the Python Cookbook when you're done. ;)

-- 
Adam Olsen, aka Rhamphoryncus

From jmatejek at suse.cz  Mon Nov 27 14:38:13 2006
From: jmatejek at suse.cz (Jan Matejek)
Date: Mon, 27 Nov 2006 14:38:13 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
Message-ID: <456AEA45.7060209@suse.cz>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Phillip J. Eby napsal(a):
> Just a suggestion, but one issue that I think needs addressing is the FHS
> language that leads some Linux distros to believe that they should change
> Python's normal installation layout (sometimes in bizarre ways) (...)
> Other vendors apparently also patch Python in various
> ways to support their FHS-based theories of how Python should install
> files.

+1 on that. There should be a clear (and clearly presented) idea of how
Python is supposed to be laid out in the distribution-provided /usr
hierarchy. And it would be nice if this idea complied to FHS.

It would also be nice if somebody finally admitted the existence of
/usr/lib64 and made Python aware of it ;e)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFFaupFjBrWA+AvBr8RArJcAKCGbeoih7TwKp2tBHtV3RMoY4JqvQCeJq87
+RgREnCI7DM/G5MNtjqmdVI=
=WHpB
-----END PGP SIGNATURE-----

From pje at telecommunity.com  Mon Nov 27 15:09:35 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 27 Nov 2006 09:09:35 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <456AEA45.7060209@suse.cz>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>

At 02:38 PM 11/27/2006 +0100, Jan Matejek wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>Phillip J. Eby napsal(a):
> > Just a suggestion, but one issue that I think needs addressing is the FHS
> > language that leads some Linux distros to believe that they should change
> > Python's normal installation layout (sometimes in bizarre ways) (...)
> > Other vendors apparently also patch Python in various
> > ways to support their FHS-based theories of how Python should install
> > files.
>
>+1 on that. There should be a clear (and clearly presented) idea of how
>Python is supposed to be laid out in the distribution-provided /usr
>hierarchy. And it would be nice if this idea complied to FHS.
>
>It would also be nice if somebody finally admitted the existence of
>/usr/lib64 and made Python aware of it ;e)

Actually, I meant that (among other things) it should be clarified that 
it's alright to e.g. put .pyc and data files inside Python library 
directories, and NOT okay to split them up.


From jason.orendorff at gmail.com  Mon Nov 27 17:00:57 2006
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Mon, 27 Nov 2006 11:00:57 -0500
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <4564C5E6.8070605@v.loewis.de>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>
	<45636067.7040305@v.loewis.de>
	<20061121215620.GA24206@code0.codespeak.net>
	<45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com>
	<4564C5E6.8070605@v.loewis.de>
Message-ID: <bb8868b90611270800w7c49f2ack598f0401f019ce73@mail.gmail.com>

Way back on 11/22/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Nick Coghlan schrieb:
> > Martin v. L?wis wrote:
> >> I personally consider it "good style" to rely on implementation details
> >> of CPython;
> >
> > Is there a 'do not' missing somewhere in there?
>
> No - I really mean it. I can find nothing wrong with people relying on
> reference counting to close files, for example. It's a property of
> CPython, and not guaranteed in other Python implementations - yet it
> works in a well-defined way in CPython. Code that relies on that feature
> is not portable, but portability is only one goal in software
> development, and may be irrelevant for some projects.

It's not necessarily future-portable either.  Having your software not
randomly break over time is relevant for most nontrivial projects.

> Similarly, it's fine when people rely on the C type "int" to have
> 32-bits when used with gcc on x86 Linux.

Relying on behavior that's implementation-defined in a particular way
for a reason (like int being 32 bits on 32-bit hardware) is one thing.
 Relying on behavior that even the implementors might not be
consciously aware of (or consider important to retain across versions)
is another.

-j

From aahz at pythoncraft.com  Mon Nov 27 17:43:23 2006
From: aahz at pythoncraft.com (Aahz)
Date: Mon, 27 Nov 2006 08:43:23 -0800
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <bb8868b90611270800w7c49f2ack598f0401f019ce73@mail.gmail.com>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>
	<45636067.7040305@v.loewis.de>
	<20061121215620.GA24206@code0.codespeak.net>
	<45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com>
	<4564C5E6.8070605@v.loewis.de>
	<bb8868b90611270800w7c49f2ack598f0401f019ce73@mail.gmail.com>
Message-ID: <20061127164323.GA21272@panix.com>

On Mon, Nov 27, 2006, Jason Orendorff wrote:
> Way back on 11/22/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Nick Coghlan schrieb:
>>> Martin v. L?wis wrote:
>>>>
>>>> I personally consider it "good style" to rely on implementation details
>>>> of CPython;
>>>
>>> Is there a 'do not' missing somewhere in there?
>>
>> No - I really mean it. I can find nothing wrong with people relying on
>> reference counting to close files, for example. It's a property of
>> CPython, and not guaranteed in other Python implementations - yet it
>> works in a well-defined way in CPython. Code that relies on that feature
>> is not portable, but portability is only one goal in software
>> development, and may be irrelevant for some projects.
> 
> It's not necessarily future-portable either.  Having your software not
> randomly break over time is relevant for most nontrivial projects.

We recently had this discussion at my day job.  We ended up agreeing
that using close() was an encouraged but not required style, because to
really avoid breakage we'd have to go with a full-bore try/except style
for file handling, and that would require too many changes (especially
without upgrading to 2.5, and we're still using 2.2/2.3).
-- 
Aahz (aahz at pythoncraft.com)           <*>         http://www.pythoncraft.com/

Usenet is not a democracy.  It is a weird cross between an anarchy and a
dictatorship.

From jason.orendorff at gmail.com  Mon Nov 27 20:47:52 2006
From: jason.orendorff at gmail.com (Jason Orendorff)
Date: Mon, 27 Nov 2006 14:47:52 -0500
Subject: [Python-Dev] PyFAQ: thread-safe interpreter operations
In-Reply-To: <20061127164323.GA21272@panix.com>
References: <435DF58A933BA74397B42CDEB8145A8606E5D424@ex9.hostedexchange.local>
	<45636067.7040305@v.loewis.de>
	<20061121215620.GA24206@code0.codespeak.net>
	<45637C95.6050907@v.loewis.de> <45641927.7080501@gmail.com>
	<4564C5E6.8070605@v.loewis.de>
	<bb8868b90611270800w7c49f2ack598f0401f019ce73@mail.gmail.com>
	<20061127164323.GA21272@panix.com>
Message-ID: <bb8868b90611271147w76328e3kd697cdca8fc8784e@mail.gmail.com>

On 11/27/06, Aahz <aahz at pythoncraft.com> wrote:
> On Mon, Nov 27, 2006, Jason Orendorff wrote:
> > Way back on 11/22/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> >> [...] I can find nothing wrong with people relying on
> >> reference counting to close files, for example. It's a property of
> >> CPython, and not guaranteed in other Python implementations - yet it
> >> works in a well-defined way in CPython. [...]
> >
> > [Feh.]
>
> We recently had this discussion at my day job.  We ended up agreeing
> that using close() was an encouraged but not required style, because to
> really avoid breakage we'd have to go with a full-bore try/except style
> for file handling, and that would require too many changes (especially
> without upgrading to 2.5, and we're still using 2.2/2.3).

Well, CPython's refcounting is something Python-dev is
(understatement) very conscious of.  I think I've even heard
assurances that it won't change Any Time Soon.  But this isn't the
case for every CPython implementation detail.  Remember what brought
all this up.  If it's obscure enough that Fredrik Lundh has to ask
around, I wouldn't bet the ranch on it.

-j

From r.m.oudkerk at googlemail.com  Mon Nov 27 21:36:21 2006
From: r.m.oudkerk at googlemail.com (Richard Oudkerk)
Date: Mon, 27 Nov 2006 20:36:21 +0000
Subject: [Python-Dev] Cloning threading.py using processes
Message-ID: <ac4216f0611271236i15a4da05he4a5ec476837fec5@mail.gmail.com>

Version 0.10 of the 'processing' package is available at the cheeseshop:

    http://cheeseshop.python.org/processing

It is intended to make writing programs using processes almost the
same as writing
programs using threads. (By importing from 'processing.dummy' instead
of 'processing'
one can use threads with the same API.)

It has been tested on both Windows and Unix.

Shared objects are created on a 'manager' which runs in its own processes.
Communication with it happens using sockets or (on windows) named pipes.


An example where integers are sent through a shared queue from a child
process to
its parent:

.   from processing import Process, Manager
.
.   def f(q):
.       for i in range(10):
.           q.put(i*i)
.       q.put('STOP')
.
.   if __name__ == '__main__':
.      manager = Manager()
.      queue = manager.Queue(maxsize=10)
.
.      p = Process(target=f, args=[queue])
.      p.start()
.
.      result = None
.      while result != 'STOP':
.           result = queue.get()
.           print result
.
.      p.join()


It has had some changes since the version I posted lasted month:

1) The use of tokens to identify shared objects is now hidden, so now
the API of
'processing' really is very similar to that of 'threading'.

2) It is much faster than before: on both Windows XP and Linux a manager serves
roughly 20,000 requests/second on a 2.5 Ghz Pentium 4.  (Though it is
not a fair comparison that is 50-100 times faster than using
SimpleXMLRPCServer/xmlrpclib.)

3) The manager process just reuses the standard synchronization types from
threading.py, Queue.py and spawns a new thread to serve each process/thread
which owns a proxy.  (The old version was single threaded and had a
select loop.)

4) Registering new shared types is straight forward, for instance

.    from processing.manager import ProcessBaseManager
.
.    class Foo(object):
.        def bar(self):
.            print 'BAR'
.
.    class NewManager(ProcessBaseManager):
.        pass
.
.    NewManager.register('Foo', Foo, exposed=['bar'])
.
.    if __name__ == '__main__':
.        manager = NewManager()
.        foo = manager.Foo()
.        foo.bar()                     # => prints 'BAR'


Cheers

Richard

From martin at v.loewis.de  Tue Nov 28 00:34:28 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Nov 2006 00:34:28 +0100
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <4569FD85.4010006@acm.org>
References: <4569F7E3.9040004@acm.org> <ekctpn$vdt$1@sea.gmane.org>
	<4569FD85.4010006@acm.org>
Message-ID: <456B7604.6050809@v.loewis.de>

Talin schrieb:
> As far as rewriting it goes - I can only rewrite things that I understand.

So if you want this to change, you obviously need to understand the
entire distutils. It's possible to do that; some people have done
it (the "understanding" part) - just go ahead and start reading source
code.

Regards,
Martin

From martin at v.loewis.de  Tue Nov 28 00:39:06 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Nov 2006 00:39:06 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>	<5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
Message-ID: <456B771A.8090300@v.loewis.de>

Phillip J. Eby schrieb:
> Actually, I meant that (among other things) it should be clarified that 
> it's alright to e.g. put .pyc and data files inside Python library 
> directories, and NOT okay to split them up.

My gut feeling is that this is out of scope for the LSB. The LSB would
only specify what a confirming distribution should do, not what
confirming applications need to do. But we will see.

Regards,
Martin

From martin at v.loewis.de  Tue Nov 28 01:06:43 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Nov 2006 01:06:43 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <456AEA45.7060209@suse.cz>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<456AEA45.7060209@suse.cz>
Message-ID: <456B7D93.2090004@v.loewis.de>

Jan Matejek schrieb:
> +1 on that. There should be a clear (and clearly presented) idea of how
> Python is supposed to be laid out in the distribution-provided /usr
> hierarchy. And it would be nice if this idea complied to FHS.

The LSB refers to the FHS, so it is clear that LSB support for Python
will have follow the LHS. Specifically, LSB 3.1 includes FHS
2.3 as a normative reference.

> It would also be nice if somebody finally admitted the existence of
> /usr/lib64 and made Python aware of it ;e)

I don't think this is really relevant for Python. The FHS specifies
that 64-bit libraries must be in /lib64 on AMD64-Linux. It is silent
on where to put Python source files and .pyc files, and, indeed,
putting them into /usr/lib/pythonX.Y seems to be FHS-conforming:

# /usr/lib includes object files, libraries, and internal binaries that
# are not intended to be executed directly by users or shell scripts.

In any case, changing Python is certainly out of the scope of
the LSB committee: they might put requirements on Python installations,
but it's not their job to "fix" Python.

Regards,
Martin

From sluggoster at gmail.com  Tue Nov 28 02:44:54 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Mon, 27 Nov 2006 17:44:54 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <456B7604.6050809@v.loewis.de>
References: <4569F7E3.9040004@acm.org> <ekctpn$vdt$1@sea.gmane.org>
	<4569FD85.4010006@acm.org> <456B7604.6050809@v.loewis.de>
Message-ID: <6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com>

On 11/27/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> Talin schrieb:
> > As far as rewriting it goes - I can only rewrite things that I understand.
>
> So if you want this to change, you obviously need to understand the
> entire distutils. It's possible to do that; some people have done
> it (the "understanding" part) - just go ahead and start reading source
> code.

You (and Fredrik) are being a little harsh on Talin.  I understand the
need to encourage people to fix things themselves rather than just
complaining about stuff they don't like. But people don't have an
unlimited amount of time and expertise to work on several Python
projects simultaneously.   Nevertheless, they should be able to offer
an "It would be good if..." suggestion without being stomped on. The
suggestion itself can be a contribution if it focuses people's
attention on a problem and a potential solution.  Just because
somebody can't learn a big subsystem and write code or docs for it *at
this moment* doesn't mean they never will.  And even if they don't,
it's possible to make contributions in one area of Python and
suggestions in another... or does the karma account not work that way?

I don't see Talin saying, "You should fix this for me."  He's saying,
"I'd like this improved and I'm working on it, but it's a big job and
I need help, ideally from someone with more expertise in distutils."
Ultimately for Python the question isn't, "Does Talin want this done?"
but, "Does this dovetail with the direction Python generally wants to
go?"  From what I've seen of setuptools/distutils evolution, yes, it's
consistent with what many people want for Python.  So instead of
saying, "You (Talin) should take on this task alone because you want
it" as if nobody else did, it would be better to say, "Thank you,
Talin, for moving this important Python issue along."

I've privately offered Talin some (unfinished) material I've been
working on anyway that relates to his vision.  When I get some other
projects cleared away I'd like to put together that TOC of links I
mentioned and perhaps collaborate on a Guide with whoever wants to.
But I also need to learn more about setuptools before I can do that.
As it happens I need the information anyway because I'm about to
package an egg....

-- 
Mike Orr <sluggoster at gmail.com>

From talin at acm.org  Tue Nov 28 08:10:08 2006
From: talin at acm.org (Talin)
Date: Mon, 27 Nov 2006 23:10:08 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com>
References: <4569F7E3.9040004@acm.org>
	<ekctpn$vdt$1@sea.gmane.org>	<4569FD85.4010006@acm.org>
	<456B7604.6050809@v.loewis.de>
	<6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com>
Message-ID: <456BE0D0.9010507@acm.org>

Mike Orr wrote:
> On 11/27/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
>> Talin schrieb:
>>> As far as rewriting it goes - I can only rewrite things that I understand.
>> So if you want this to change, you obviously need to understand the
>> entire distutils. It's possible to do that; some people have done
>> it (the "understanding" part) - just go ahead and start reading source
>> code.
> 
> You (and Fredrik) are being a little harsh on Talin.  I understand the
> need to encourage people to fix things themselves rather than just
> complaining about stuff they don't like. But people don't have an
> unlimited amount of time and expertise to work on several Python
> projects simultaneously.   Nevertheless, they should be able to offer
> an "It would be good if..." suggestion without being stomped on. The
> suggestion itself can be a contribution if it focuses people's
> attention on a problem and a potential solution.  Just because
> somebody can't learn a big subsystem and write code or docs for it *at
> this moment* doesn't mean they never will.  And even if they don't,
> it's possible to make contributions in one area of Python and
> suggestions in another... or does the karma account not work that way?
> 
> I don't see Talin saying, "You should fix this for me."  He's saying,
> "I'd like this improved and I'm working on it, but it's a big job and
> I need help, ideally from someone with more expertise in distutils."
> Ultimately for Python the question isn't, "Does Talin want this done?"
> but, "Does this dovetail with the direction Python generally wants to
> go?"  From what I've seen of setuptools/distutils evolution, yes, it's
> consistent with what many people want for Python.  So instead of
> saying, "You (Talin) should take on this task alone because you want
> it" as if nobody else did, it would be better to say, "Thank you,
> Talin, for moving this important Python issue along."
> 
> I've privately offered Talin some (unfinished) material I've been
> working on anyway that relates to his vision.  When I get some other
> projects cleared away I'd like to put together that TOC of links I
> mentioned and perhaps collaborate on a Guide with whoever wants to.
> But I also need to learn more about setuptools before I can do that.
> As it happens I need the information anyway because I'm about to
> package an egg....
> 

What you are saying is basically correct, although I have a slightly 
different spin on it.

I've written a lot of documentation over the years, and I know that one 
of the hardest parts of writing documentation is trying to identify your 
own assumptions. To someone who already knows how the system works, its 
hard to understand the mindset of someone who is just learning it. You 
tend to unconsciously assume knowledge of certain things which a new 
user might not know.

To that extent, it can be useful sometimes to have someone who is in the 
process of learning how to use the system, and who is willing to 
carefully analyze and write down their own experiences while doing so. 
Most of the time people are too busy to do this - they want to get their 
immediate problem solved, and they aren't interested in how difficult it 
will be for the next person. This is especially true in cases where the 
problem that is holding them up is three levels down from the level 
where their real goal is - they want to be able to "pop the stack" of 
problems as quickly as possible, so that they can get back to solving 
their *real* problem.

So what I am offering, in this case, is my ignorance -- but a carefully 
described ignorance :) I don't demand that anyone do anything - I'm 
merely pointing out some things that people may or may not care about.

Now, in this particular case, I have actually used distutils before. But 
distutils is one of those systems (like Perl) which tends to leak out of 
your brain if you don't use it regularly - that is, if you only use it 
once every 6 months, at the end of 6 months you have forgotten most of 
what you have learned, and you have to start the learning curve all over 
again. And I am in the middle of that re-learning process right now.

What I am doing right now is creating a new extension project using 
setuputils, and keeping notes on what I do. So for example, I start by 
creating the directory structure:

    mkdir myproject
    cd myproject
    mkdir src
    mkdir test

Next, create a minimal setup.py script. I won't include that here, but 
it's in the notes.

Next, create the myproject.c file for the module in src/, and write the 
'init' function for the module. (again, content omitted but it's in my 
notes). Create a projectname_unittest.py file in test. Add both of these 
to the setup.py file.

At this point, you ought to be able to a "python setup.py test" and have 
it succeed. At this point, you can start adding types and methods, with 
a unit test for each one, testing each one as it is added.

Now, I realize that all of this is "baby steps" to you folks, but it 
took me a day or so to figure out. And its interesting that even these 
few steps cut across a number of tools and libraries - setuptools, 
distutils, unittest, the "extending Python" doc and the "Python C API" doc.

(BTW, I realized another thing that would be really handy is if the 
"extending Python" doc contained hyperlink references to the "Python C 
API" doc, so that when it talks about, say, PyArg_ParseTuple, you could 
go straight to the reference doc for it.)

-- Talin

From robinbryce at gmail.com  Tue Nov 28 13:19:50 2006
From: robinbryce at gmail.com (Robin Bryce)
Date: Tue, 28 Nov 2006 12:19:50 +0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<456AEA45.7060209@suse.cz>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
Message-ID: <bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>

> Actually, I meant that (among other things) it should be clarified that
> it's alright to e.g. put .pyc and data files inside Python library
> directories, and NOT okay to split them up.

Phillip, Just to be clear: I understand you are not in favour of
re-packaging data from python projects (projects in the distutils
sense), separately and I strongly agree with this view. Are you
opposed to developers choosing to *not* bundle data as python package
data ? How much, if any, of the setuptools / distutils conventions do
you think could sensibly peculate up to the LSB ?

There are a couple of cases in ubuntu/debian (as of 6.10 edgy) that I
think are worth considering:

python2.4 profile (pstats) etc, was removed due to licensing issues
rather than FHS. Should not be an issue for python2.5 but what, in
general, can a vendor do except break python if their licensing policy
cant accommodate all of pythons batteries ?


python2.4 distutils is excluded by default. This totally blows in my
view but I appreciate this one is a minefield of vendor packaging
politics. It has to be legitimate for Python / setuptools too provide
packaging infrastructure and conventions that are viable on more than
linux. Is it unreasonable for a particular vendor to decide that, on
their platform, the will disable Python's packaging conventions ? Is
there any way to keep the peace on this one ?


Cheers,
Robin


On 27/11/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 02:38 PM 11/27/2006 +0100, Jan Matejek wrote:
> >-----BEGIN PGP SIGNED MESSAGE-----
> >Hash: SHA1
> >
> >Phillip J. Eby napsal(a):
> > > Just a suggestion, but one issue that I think needs addressing is the FHS
> > > language that leads some Linux distros to believe that they should change
> > > Python's normal installation layout (sometimes in bizarre ways) (...)
> > > Other vendors apparently also patch Python in various
> > > ways to support their FHS-based theories of how Python should install
> > > files.
> >
> >+1 on that. There should be a clear (and clearly presented) idea of how
> >Python is supposed to be laid out in the distribution-provided /usr
> >hierarchy. And it would be nice if this idea complied to FHS.
> >
> >It would also be nice if somebody finally admitted the existence of
> >/usr/lib64 and made Python aware of it ;e)
>
> Actually, I meant that (among other things) it should be clarified that
> it's alright to e.g. put .pyc and data files inside Python library
> directories, and NOT okay to split them up.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/robinbryce%40gmail.com
>

From anthony at interlink.com.au  Tue Nov 28 14:53:14 2006
From: anthony at interlink.com.au (Anthony Baxter)
Date: Wed, 29 Nov 2006 00:53:14 +1100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
Message-ID: <200611290053.17199.anthony@interlink.com.au>

On Tuesday 28 November 2006 23:19, Robin Bryce wrote:
> python2.4 profile (pstats) etc, was removed due to licensing
> issues rather than FHS. Should not be an issue for python2.5 but
> what, in general, can a vendor do except break python if their
> licensing policy cant accommodate all of pythons batteries ?

That's a historical case, and as far as I know, unique. I can't 
imagine we'd accept any new standard library contributions (no 
matter how compelling) without the proper licensing work being 
done.


> python2.4 distutils is excluded by default. This totally blows in
> my view but I appreciate this one is a minefield of vendor
> packaging politics. It has to be legitimate for Python /
> setuptools too provide packaging infrastructure and conventions
> that are viable on more than linux. Is it unreasonable for a
> particular vendor to decide that, on their platform, the will
> disable Python's packaging conventions ? Is there any way to keep
> the peace on this one ?

I still have no idea why this was one - I was also one of the people 
who jumped up and down asking Debian/Ubuntu to fix this idiotic 
decision. Personally, I consider any distributions that break the 
standard library into non-required pieces to be shipping a _broken_ 
Python. As someone who writes and releases software, this is a 
complete pain. I can't tell you how many times through the years 
I'd get user complaints because they didn't get distutils installed 
as part of the standard library.

(The only other packaging thing like this that I'm aware of is 
python-minimal in Ubuntu. This is done for installation purposes 
and wacky dependency issues that occur when a fair chunk of the O/S 
is actually written in Python. It's worth noting that the entirety 
of the Python stdlib is a required package, so it doesn't cause 
issues.)

Anthony
-- 
Anthony Baxter     <anthony at interlink.com.au>
It's never too late to have a happy childhood.

From barry at python.org  Tue Nov 28 16:26:53 2006
From: barry at python.org (Barry Warsaw)
Date: Tue, 28 Nov 2006 10:26:53 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <200611290053.17199.anthony@interlink.com.au>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
Message-ID: <CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 28, 2006, at 8:53 AM, Anthony Baxter wrote:

> (The only other packaging thing like this that I'm aware of is
> python-minimal in Ubuntu. This is done for installation purposes
> and wacky dependency issues that occur when a fair chunk of the O/S
> is actually written in Python. It's worth noting that the entirety
> of the Python stdlib is a required package, so it doesn't cause
> issues.)

There's a related issue that may or may not be in scope for this  
thread.  For distros like Gentoo or Ubuntu that rely heavily on their  
own system Python for the OS to work properly, I'm quite loathe to  
install Cheeseshop packages into the system site-packages.  I've had  
Gentoo break occasionally when I did this for example (though I don't  
remember the details now), so I always end up installing my own /usr/ 
local/bin/python and installing my 3rd party packages into there.   
Even though site-packages is last on sys.path, installing 3rd party  
packages can still break the OS if the system itself installs  
incompatible versions of such packages into its site-packages.

Mailman's philosophy is to install the 3rd party packages it requires  
into its own 'pythonlib' directory that gets put first on sys.path.   
It does this for several reasons: I want to be able to override  
stdlib packages such as email with newer versions, I don't want to  
have to mess around at all with the system's site-packages, and I  
don't want updates to the system Python to break my application.

I question whether a distro built on Python can even afford to allow  
3rd party packages to be installed in their system's site-packages.   
Maybe Python needs to extend its system-centric view of site-packages  
with an application-centric and/or user-centric view of extensions?   
The only reason I can think of for Mailman /not/ using its own  
pythonlib is to save on disk space, and really, who cares about that  
any more?  I submit that most applications of any size will have way  
more application data than duplicated Python libraries.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRWxVQ3EjvBPtnXfVAQIuMAQAkciyaHCwLnkN+8GwbhUro+vJuna+JObP
AZaNzPKYABITqu5fKPl3aEvQz+9pNUvjM2c/q5p1m/9n34ZBURfgpHa3yk7QcbW0
sud8utdW6wMHMuWVw/1lQNaZ2GeJz9E4CgO93btfgiMLFIrcnBxr6uw5NqTrMwOc
4iIupbjYfUg=
=Nxff
-----END PGP SIGNATURE-----

From martin at v.loewis.de  Tue Nov 28 19:08:17 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Nov 2006 19:08:17 +0100
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <456BE0D0.9010507@acm.org>
References: <4569F7E3.9040004@acm.org>	<ekctpn$vdt$1@sea.gmane.org>	<4569FD85.4010006@acm.org>	<456B7604.6050809@v.loewis.de>	<6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com>
	<456BE0D0.9010507@acm.org>
Message-ID: <456C7B11.1010409@v.loewis.de>

Talin schrieb:
> To that extent, it can be useful sometimes to have someone who is in the 
> process of learning how to use the system, and who is willing to 
> carefully analyze and write down their own experiences while doing so. 

I readily agree that the documentation can be improved, and applaud
efforts to do so. And I have no doubts that distutils is difficult to
learn for a beginner.

In Talin's remarks, there was also the suggestion that distutils is
"in need of some serious refactoring". It is such remarks that get
me started: it seems useless to me to make such a statement if they
are not accompanied with concrete proposals what specifically to
change. It also gets me upset because it suggests that all prior
contributors weren't serious.

Regards,
Martin

From martin at v.loewis.de  Tue Nov 28 19:17:04 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Tue, 28 Nov 2006 19:17:04 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>	<456AEA45.7060209@suse.cz>	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
Message-ID: <456C7D20.1080604@v.loewis.de>

Robin Bryce schrieb:
> python2.4 profile (pstats) etc, was removed due to licensing issues
> rather than FHS. Should not be an issue for python2.5 but what, in
> general, can a vendor do except break python if their licensing policy
> cant accommodate all of pythons batteries ?

If some vendor has a valid concern about the licensing of a certain
piece of Python, they should bring that up while the LSB is being
defined.

> python2.4 distutils is excluded by default. This totally blows in my
> view but I appreciate this one is a minefield of vendor packaging
> politics. It has to be legitimate for Python / setuptools too provide
> packaging infrastructure and conventions that are viable on more than
> linux.

Again, that a decision for the LSB standard to make. If LSB defines that
distutils is part of LSB (notice the *If*: this is all theoretical;
the LSB doesn't yet define anything for Python), then each vendor
can still chose to include distutils or not, if they don't, they
wouldn't comply to that version of LSB. So it is *always* their choice
what standard to follow.

OTOH, certain customers demand LSB conformance, so a vendor that choses
not to follow LSB may lose customers.

I personally agree that "Linux standards" should specify a standard
layout for a Python installation, and that it should be the one that
"make install" generates (perhaps after "make install" is adjusted).
Whether or not it is the *LSB* that needs to specify that, I don't
know, because the LSB does not specify a file system layout. Instead,
it incorporates the FHS - which might be the right place to define
the layout of a Python installation. For the LSB, it's more import
that "import httplib" gives you something working, no matter where
httplib.py comes from (or whether it comes from httplib.py at all).

Regards,
Martin

From talin at acm.org  Tue Nov 28 19:33:08 2006
From: talin at acm.org (Talin)
Date: Tue, 28 Nov 2006 10:33:08 -0800
Subject: [Python-Dev] Distribution tools: What I would like to see
In-Reply-To: <456C7B11.1010409@v.loewis.de>
References: <4569F7E3.9040004@acm.org>	<ekctpn$vdt$1@sea.gmane.org>	<4569FD85.4010006@acm.org>	<456B7604.6050809@v.loewis.de>	<6e9196d20611271744o2e4a8795od83828e280bfeeb9@mail.gmail.com>
	<456BE0D0.9010507@acm.org> <456C7B11.1010409@v.loewis.de>
Message-ID: <456C80E4.5060000@acm.org>

Martin v. L?wis wrote:
> Talin schrieb:
>> To that extent, it can be useful sometimes to have someone who is in the 
>> process of learning how to use the system, and who is willing to 
>> carefully analyze and write down their own experiences while doing so. 
> 
> I readily agree that the documentation can be improved, and applaud
> efforts to do so. And I have no doubts that distutils is difficult to
> learn for a beginner.
> 
> In Talin's remarks, there was also the suggestion that distutils is
> "in need of some serious refactoring". It is such remarks that get
> me started: it seems useless to me to make such a statement if they
> are not accompanied with concrete proposals what specifically to
> change. It also gets me upset because it suggests that all prior
> contributors weren't serious.

I'm sorry if I implied that distutils was 'misdesigned', that wasn't 
what I meant. Refactoring is usually desirable when a body of code has 
accumulated a lot of additional baggage as a result of maintenance and 
feature additions, accompanied by the observation that if the baggage 
had been present when the system was originally created, the design of 
the system would have been substantially different. Refactoring is 
merely an attempt to discover what that original design might have been, 
if the requirements had been known at the time.

What I was reacting to, I think, is that it seemed like in some ways the 
'diffness' of setuptools wasn't just in the documentation, but in the 
code itself, and if both setuptools and distutils had been co-developed, 
then distutils might have been someone different as a result.

Also, I admit that some of this is hearsay, so maybe I should just back 
off on this one.

> Regards,
> Martin


From sluggoster at gmail.com  Tue Nov 28 20:41:48 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Tue, 28 Nov 2006 11:41:48 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
Message-ID: <6e9196d20611281141h428478c8v5329f9ca5433a7bf@mail.gmail.com>

On 11/28/06, Barry Warsaw <barry at python.org> wrote:
> For distros like Gentoo or Ubuntu that rely heavily on their
> own system Python for the OS to work properly, I'm quite loathe to
> install Cheeseshop packages into the system site-packages.  I've had
> Gentoo break occasionally when I did this for example (though I don't
> remember the details now), so I always end up installing my own /usr/
> local/bin/python and installing my 3rd party packages into there.
> Even though site-packages is last on sys.path, installing 3rd party
> packages can still break the OS if the system itself installs
> incompatible versions of such packages into its site-packages.

One wishes distro vendors would install a separate copy of Python for
their internal OS stuff  so that broken-library or version issues
wouldn't affect the system.  That would be worth putting into the
standard.

-- 
Mike Orr <sluggoster at gmail.com>

From barry at python.org  Tue Nov 28 21:11:57 2006
From: barry at python.org (Barry Warsaw)
Date: Tue, 28 Nov 2006 15:11:57 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <6e9196d20611281141h428478c8v5329f9ca5433a7bf@mail.gmail.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<6e9196d20611281141h428478c8v5329f9ca5433a7bf@mail.gmail.com>
Message-ID: <4EB09625-4338-4537-9DDD-088359E2E33A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 28, 2006, at 2:41 PM, Mike Orr wrote:

> On 11/28/06, Barry Warsaw <barry at python.org> wrote:
>> For distros like Gentoo or Ubuntu that rely heavily on their
>> own system Python for the OS to work properly, I'm quite loathe to
>> install Cheeseshop packages into the system site-packages.  I've had
>> Gentoo break occasionally when I did this for example (though I don't
>> remember the details now), so I always end up installing my own /usr/
>> local/bin/python and installing my 3rd party packages into there.
>> Even though site-packages is last on sys.path, installing 3rd party
>> packages can still break the OS if the system itself installs
>> incompatible versions of such packages into its site-packages.
>
> One wishes distro vendors would install a separate copy of Python for
> their internal OS stuff  so that broken-library or version issues
> wouldn't affect the system.  That would be worth putting into the
> standard.

Agreed.  But that would just eliminate one potential source of  
"application" conflict (defining the OS itself as just another  
application).

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRWyYEnEjvBPtnXfVAQJCKQP7BXVOYUIvbEBgFK7nWHieBqRGXzohhKNZ
SN5qV4P6uZGnCtjp1Z4W8U82X8TH+X3Ovx02mS+GN+nrlyF7AVhDr/mSLXI90Kan
1dqOhAIz5rBeT03/k0SpAPSiBhonl4zF4ZmezGaz3lif2CjsH6PT9153Mv7wXb1N
ut2QIhXnejA=
=jhbd
-----END PGP SIGNATURE-----

From theller at ctypes.org  Tue Nov 28 21:26:54 2006
From: theller at ctypes.org (Thomas Heller)
Date: Tue, 28 Nov 2006 21:26:54 +0100
Subject: [Python-Dev] ctypes and powerpc
In-Reply-To: <B36F0C1C-010F-1000-A0D6-7FAA64990A80-Webmail-10020@mac.com>
References: <456745DA.3010903@ctypes.org>
	<B36F0C1C-010F-1000-A0D6-7FAA64990A80-Webmail-10020@mac.com>
Message-ID: <eki60q$qr9$1@sea.gmane.org>

Ronald Oussoren schrieb:
>  
> On Friday, November 24, 2006, at 08:21PM, "Thomas Heller" <theller at ctypes.org> wrote:
>>I'd like to ask for help with an issue which I do not know
>>how to solve.
>>
>>Please see this bug http://python.org/sf/1563807
>>"ctypes built with GCC on AIX 5.3 fails with ld ffi error"
>>
>>Apparently this is a powerpc machine, ctypes builds but cannot be imported
>>because of undefined symbols like 'ffi_call', 'ffi_prep_closure'.
>>
>>These symbols are defined in file
>>  Modules/_ctypes/libffi/src/powerpc/ffi_darwin.c.
>>The whole contents of this file is enclosed within a
>>
>>#ifdef __ppc__
>>...
>>#endif
>>
>>block.  IIRC, this block has been added by Ronald for the
>>Mac universal build.  Now, it seems that on the AIX machine
>>the __ppc__ symbols is not defined; removing the #ifdef/#endif
>>makes the built successful.
> 
> The defines were indeed added for the universal build and I completely overlooked the fact that ffi_darwin.c is also used for AIX. One way to fix this is
> 
> #if !  (defined(__APPLE__) && !defined(__ppc__))
> ...
> #endif
> 
> That is, compile the file unless __APPLE__ is defined but __ppc__ isn't. This more clearly documents the intent. 

Yes, this makes the most sense.  I've taken this approach.

Thanks,
Thomas


From guido at python.org  Tue Nov 28 22:05:20 2006
From: guido at python.org (Guido van Rossum)
Date: Tue, 28 Nov 2006 13:05:20 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
Message-ID: <ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.com>

On 11/28/06, Barry Warsaw <barry at python.org> wrote:
> There's a related issue that may or may not be in scope for this
> thread.  For distros like Gentoo or Ubuntu that rely heavily on their
> own system Python for the OS to work properly, I'm quite loathe to
> install Cheeseshop packages into the system site-packages.

I wonder if would help if we were to add a vendor-packages directory
where distros can put their own selection of 3rd party stuff they
depend on, to be searched before site-packages, and a command-line
switch that ignores site-package but still searches vendor-package.
(-S would almost do it but probably suppresses too  much.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje at telecommunity.com  Tue Nov 28 22:19:47 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 28 Nov 2006 16:19:47 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.co
 m>
References: <CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
Message-ID: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com>

At 01:05 PM 11/28/2006 -0800, Guido van Rossum wrote:
>On 11/28/06, Barry Warsaw <barry at python.org> wrote:
> > There's a related issue that may or may not be in scope for this
> > thread.  For distros like Gentoo or Ubuntu that rely heavily on their
> > own system Python for the OS to work properly, I'm quite loathe to
> > install Cheeseshop packages into the system site-packages.
>
>I wonder if would help if we were to add a vendor-packages directory
>where distros can put their own selection of 3rd party stuff they
>depend on, to be searched before site-packages, and a command-line
>switch that ignores site-package but still searches vendor-package.
>(-S would almost do it but probably suppresses too  much.)

They could also use -S and then explicitly insert the vendor-packages 
directory into sys.path at the beginning of their scripts.  And a .pth in 
site-packages could add vendor-packages at the *end* of sys.path, so that 
scripts not using -S would pick it up.

This would be backward compatible except for the vendor scripts that want 
to use this approach.


From barry at python.org  Wed Nov 29 00:45:04 2006
From: barry at python.org (Barry Warsaw)
Date: Tue, 28 Nov 2006 18:45:04 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.com>
Message-ID: <7D58A2FE-DDB3-4104-B9FB-F13FD483FF83@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 28, 2006, at 4:05 PM, Guido van Rossum wrote:

> On 11/28/06, Barry Warsaw <barry at python.org> wrote:
>> There's a related issue that may or may not be in scope for this
>> thread.  For distros like Gentoo or Ubuntu that rely heavily on their
>> own system Python for the OS to work properly, I'm quite loathe to
>> install Cheeseshop packages into the system site-packages.
>
> I wonder if would help if we were to add a vendor-packages directory
> where distros can put their own selection of 3rd party stuff they
> depend on, to be searched before site-packages, and a command-line
> switch that ignores site-package but still searches vendor-package.
> (-S would almost do it but probably suppresses too  much.)

I keep thinking I'd like to treat the OS as just another application,  
so that there's nothing special about it and the same infrastructure  
could be used for other applications with lots of entry level scripts.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRWzKAHEjvBPtnXfVAQK9AAQAsJS2Ag9yBO+dLGiZdJlaWAj64zWcd9oi
zqaE95/y53iXBvMBynglROApDEdOsnv/1/XSx1+2gZVIkuFvHLplbqZWVCsZ56r+
nAcTzFXsM2zPBSECKWuSfxBUILKalRdaIXKOUjgd0iZTrCbt3EeTmZlxMTKq9sGU
1Scr8sHSpIE=
=rNjl
-----END PGP SIGNATURE-----

From greg at electricrain.com  Wed Nov 29 00:25:49 2006
From: greg at electricrain.com (Gregory P. Smith)
Date: Tue, 28 Nov 2006 15:25:49 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
Message-ID: <20061128232549.GB17224@electricrain.com>

> I question whether a distro built on Python can even afford to allow  
> 3rd party packages to be installed in their system's site-packages.   
> Maybe Python needs to extend its system-centric view of site-packages  
> with an application-centric and/or user-centric view of extensions?   

Agreed, I do not think that should be allowed.  A system site-packages
directory for a python install is a convenient bandaid but not a good
idea for real world deployment of anything third party.  It suffers
from the same classic DLL Hell problem that windows has suffered with
for eons with applications all including the "same" DLLs and putting
them in the system directory.

I'm fine if an OS distro wants to use site-packages for things the OS
depends on in its use of python.  I'm fine with the OS offering its
own packages (debs or rpms or whatnot) that install additional python
libraries under site-packages for use system wide or to satisfy
dependancies from other system packages.  Those are all managed
properly for compatibility at the OS distro level.  Whats bad is for
third party (non-os-distro packaged) applications to touch
site-packages.

-greg


From glyph at divmod.com  Wed Nov 29 01:01:30 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 29 Nov 2006 00:01:30 -0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <20061129000130.11053.1150542058.divmod.xquotient.111@joule.divmod.com>

On 11:45 pm, barry at python.org wrote:

>I keep thinking I'd like to treat the OS as just another application,
>so that there's nothing special about it and the same infrastructure
>could be used for other applications with lots of entry level scripts.

I agree.  The motivation here is that the "OS" application keeps itself separate so that incorrect changes to configuration or installation of incompatible versions of dependencies don't break it.  There are other applications which also don't want to break.

This is a general problem with Python, one that should be solved with a comprehensive parallel installation or "linker" which explicitly describes dependencies and allows for different versions of packages.  I definitely don't think that this sort of problem should be solved during the *standardization* process - that should just describe the existing conventions for packaging Python stuff, and the OS can insulate itself in terms of that.  Definitely it shouldn't be changed as part of standardization unless the distributors are asking for it loudly.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061129/c168fb09/attachment.htm 

From pje at telecommunity.com  Wed Nov 29 01:10:15 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 28 Nov 2006 19:10:15 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <ADAEEF4F-F220-4E76-B07B-9A4C6997B744@python.org>
References: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20061128190845.02865cd0@sparrow.telecommunity.com>

At 06:41 PM 11/28/2006 -0500, Barry Warsaw wrote:
>On Nov 28, 2006, at 4:19 PM, Phillip J. Eby wrote:
>>At 01:05 PM 11/28/2006 -0800, Guido van Rossum wrote:
>>>On 11/28/06, Barry Warsaw <barry at python.org> wrote:
>>> > There's a related issue that may or may not be in scope for this
>>> > thread.  For distros like Gentoo or Ubuntu that rely heavily on
>>>their
>>> > own system Python for the OS to work properly, I'm quite loathe to
>>> > install Cheeseshop packages into the system site-packages.
>>>
>>>I wonder if would help if we were to add a vendor-packages directory
>>>where distros can put their own selection of 3rd party stuff they
>>>depend on, to be searched before site-packages, and a command-line
>>>switch that ignores site-package but still searches vendor-package.
>>>(-S would almost do it but probably suppresses too  much.)
>>
>>They could also use -S and then explicitly insert the vendor- packages 
>>directory into sys.path at the beginning of their scripts.
>
>Possibly, but stuff like this can be a pain because your dependent
>app must build in the infrastructure itself to get the right paths
>set up for its scripts.
>...
>Maybe there's no better way of doing this and applications are best
>left to their own devices.  But in the back of my mind, I keep
>thinking there should be a better way. ;)

Well, you can always use setuptools, which generates script wrappers that 
import the desired module and call a function, after first setting up 
sys.path.  :)


From barry at python.org  Wed Nov 29 00:41:37 2006
From: barry at python.org (Barry Warsaw)
Date: Tue, 28 Nov 2006 18:41:37 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com>
References: <CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com>
Message-ID: <ADAEEF4F-F220-4E76-B07B-9A4C6997B744@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 28, 2006, at 4:19 PM, Phillip J. Eby wrote:

> At 01:05 PM 11/28/2006 -0800, Guido van Rossum wrote:
>> On 11/28/06, Barry Warsaw <barry at python.org> wrote:
>> > There's a related issue that may or may not be in scope for this
>> > thread.  For distros like Gentoo or Ubuntu that rely heavily on  
>> their
>> > own system Python for the OS to work properly, I'm quite loathe to
>> > install Cheeseshop packages into the system site-packages.
>>
>> I wonder if would help if we were to add a vendor-packages directory
>> where distros can put their own selection of 3rd party stuff they
>> depend on, to be searched before site-packages, and a command-line
>> switch that ignores site-package but still searches vendor-package.
>> (-S would almost do it but probably suppresses too  much.)
>
> They could also use -S and then explicitly insert the vendor- 
> packages directory into sys.path at the beginning of their scripts.

Possibly, but stuff like this can be a pain because your dependent  
app must build in the infrastructure itself to get the right paths  
set up for its scripts.  An approach I've used in the past is to put  
a paths.py file in the bin directory and force every script to  
"import paths" before it imports anything it doesn't want to get from  
the stdlib (including overrides).  paths.py is actually generated  
though because the user could specify an alternative Python with a  
configure switch.

What I'm moving to now though is a sort of 'shell' or driver script  
which does that path setup once, then imports a module based on argv 
[0], sniffing out a main() and then calling that.  The trick then of  
course is that you symlink all the top-level user scripts to this  
shell.  Works fine if all you care about is *nix <wink>, but it does  
mean an application with lots of entry-level scripts has to build all  
this infrastructure itself.

Maybe there's no better way of doing this and applications are best  
left to their own devices.  But in the back of my mind, I keep  
thinking there should be a better way. ;)

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRWzJMXEjvBPtnXfVAQJHmAP/UhGUv1Wxt2AzGT08dM9/M0J4pahGnrF3
VwbrdRTF6Jt32iAKAJolrnTE+XlMaTGitYv+mu8v3SgJLWwe+aeJwpg8AdOn5jBL
bSjBpE9UeqUSiMhaJmBbx/z5ISv4OioJLX+vzBv6u0yBTYv4uoYZPKoeMcCe6Afw
7e1gIL1WHL4=
=scvm
-----END PGP SIGNATURE-----

From barry at python.org  Wed Nov 29 01:26:29 2006
From: barry at python.org (Barry Warsaw)
Date: Tue, 28 Nov 2006 19:26:29 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <5.1.1.6.0.20061128190845.02865cd0@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<5.1.1.6.0.20061128161557.02863e88@sparrow.telecommunity.com>
	<5.1.1.6.0.20061128190845.02865cd0@sparrow.telecommunity.com>
Message-ID: <91BBC5A5-00C1-4063-B10F-6BDA8BD19E59@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 28, 2006, at 7:10 PM, Phillip J. Eby wrote:

> Well, you can always use setuptools, which generates script  
> wrappers that import the desired module and call a function, after  
> first setting up sys.path.  :)

That's so 21st Century!  Where was setuptools back in 1996? :)   
Seriously though, that does sound cool, and thanks for the tip.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRWzTtXEjvBPtnXfVAQKDwgP+N/nGkHm7e9ZK+DmTEx+gOxPkeQnpKcA2
AHLg9WLJhLHlrxlekftm3F1+YNQv9R6tthRKu6Zgz5fJTPs57MluJ4qAzPapDymT
oGX5Y3HxCdaqrw0HWviuJeUr8euN7NIghUAsEbe51pppfbTs80dGnDrDRL4AfXGm
4/C9DW2URkQ=
=pt5u
-----END PGP SIGNATURE-----

From Daniel.Trstenjak at science-computing.de  Wed Nov 29 10:06:03 2006
From: Daniel.Trstenjak at science-computing.de (Daniel Trstenjak)
Date: Wed, 29 Nov 2006 09:06:03 +0000 (UTC)
Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode
Message-ID: <20061129090611.GA19856@bug.science-computing.de>


Hi all,

I would like to know the definition of the 'locals' object given to
PyEval_EvalCode. Has 'locals' to be a python dictionary or a subtype
of a python dictionary, or is it enough if the object implements the
necessary protocols?

The python implementation behaves different for the two following code
lines:

from modul import symbol
from modul import *

In the case of the first one, it's enough if the object 'locals' implements
the necessary protocols. The second one only works if the object 'locals'
is a type or subtype of dictionary.

The problem lies in Python-2.5/Python/ceval.c:

	    static int
	    import_all_from(PyObject *locals, PyObject *v)
	    {
	       ...
      4046	       value = PyObject_GetAttr(v, name);
      4047	       if (value == NULL)
      4048		  err = -1;
      4049	       else
>>>   4050		  err = PyDict_SetItem(locals, name, value);
      4051	       Py_DECREF(name);       
	       ...
	    }

Changing PyDict_SetItem in line 4050 with PyObject_SetAttr could fix it.
    
    
Best Regards,
     Daniel


From Jack.Jansen at cwi.nl  Wed Nov 29 10:34:26 2006
From: Jack.Jansen at cwi.nl (Jack Jansen)
Date: Wed, 29 Nov 2006 10:34:26 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.com>
Message-ID: <A9C77A96-C659-4856-AB84-F9F80E3E4C3A@cwi.nl>


On 28-nov-2006, at 22:05, Guido van Rossum wrote:

> On 11/28/06, Barry Warsaw <barry at python.org> wrote:
>> There's a related issue that may or may not be in scope for this
>> thread.  For distros like Gentoo or Ubuntu that rely heavily on their
>> own system Python for the OS to work properly, I'm quite loathe to
>> install Cheeseshop packages into the system site-packages.
>
> I wonder if would help if we were to add a vendor-packages directory
> where distros can put their own selection of 3rd party stuff they
> depend on, to be searched before site-packages, and a command-line
> switch that ignores site-package but still searches vendor-package.
> (-S would almost do it but probably suppresses too  much.)

+1.

We've been running into this problem on the Mac since Apple started  
shipping Python.

There's another standard place that is searched on MacOS: a per-user  
package directory ~/Library/Python/2.5/site-packages (the name "site- 
packages" is a misnomer, really). Standardising something here is  
less important than for vendor-packages (as the effect can easily be  
gotten by adding things to PYTHONPATH) but it has one advantage:  
distutils and such could be taught about it and provide an option to  
install either systemwide or for the current user only.
--
Jack Jansen, <Jack.Jansen at cwi.nl>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma  
Goldman


From glyph at divmod.com  Wed Nov 29 11:18:26 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Wed, 29 Nov 2006 10:18:26 -0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>

On 09:34 am, jack.jansen at cwi.nl wrote:

>There's another standard place that is searched on MacOS: a per-user
>package directory ~/Library/Python/2.5/site-packages (the name "site-
>packages" is a misnomer, really). Standardising something here is
>less important than for vendor-packages (as the effect can easily be
>gotten by adding things to PYTHONPATH) but it has one advantage:
>distutils and such could be taught about it and provide an option to
>install either systemwide or for the current user only.

Yes, let's do that, please.  I've long been annoyed that site.py sets up a local user installation directory, a very useful feature, but _only_ on OS X.  I've long since promoted my personal hack to add a local user installation directory into a public project -- divmod's "Combinator" -- but it would definitely be preferable for Python to do something sane by default (and have setuptools et. al. support it).

I'd suggest using "~/.local/lib/pythonX.X/site-packages" for the "official" UNIX installation location, since it's what we're already using, and ~/.local seems like a convention being slowly adopted by GNOME and the like.  I don't know the cultural equivalent in Windows - "%USERPROFILE%\Application Data\PythonXX" maybe?

It would be nice if site.py would do this in the same place as it sets up the "darwin"-specific path, and to set that path as a module global, so packaging tools could use "site.userinstdir" or something.  Right now, if it's present, it's just some random entry on sys.path.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061129/63ad3d9a/attachment.html 

From arigo at tunes.org  Wed Nov 29 12:23:54 2006
From: arigo at tunes.org (Armin Rigo)
Date: Wed, 29 Nov 2006 12:23:54 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <200611290053.17199.anthony@interlink.com.au>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
Message-ID: <20061129112354.GA30665@code0.codespeak.net>

Hi Anthony,

On Wed, Nov 29, 2006 at 12:53:14AM +1100, Anthony Baxter wrote:
> > python2.4 distutils is excluded by default.
> 
> I still have no idea why this was one - I was also one of the people 
> who jumped up and down asking Debian/Ubuntu to fix this idiotic 
> decision.

I could not agree more.  Nowadays, whenever I get an account on a new
Linux machine, the first thing I have to do is reinstall Python
correctly in my home dir because the system Python lacks distutils.
Wasteful.  (There are some applications and libraries that use distutils
at run-time to compile things, and I'm using such applications and
libraries on a daily basis.)


Armin

From guido at python.org  Wed Nov 29 16:39:25 2006
From: guido at python.org (Guido van Rossum)
Date: Wed, 29 Nov 2006 07:39:25 -0800
Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode
In-Reply-To: <20061129090611.GA19856@bug.science-computing.de>
References: <20061129090611.GA19856@bug.science-computing.de>
Message-ID: <ca471dc20611290739q4891553epb09e37cae428e65@mail.gmail.com>

This seems a bug. In revision 36714 by Raymond Hettinger, the
restriction that locals be a dict was relaxed to allow any mapping.

On 11/29/06, Daniel Trstenjak <Daniel.Trstenjak at science-computing.de> wrote:
>
> Hi all,
>
> I would like to know the definition of the 'locals' object given to
> PyEval_EvalCode. Has 'locals' to be a python dictionary or a subtype
> of a python dictionary, or is it enough if the object implements the
> necessary protocols?
>
> The python implementation behaves different for the two following code
> lines:
>
> from modul import symbol
> from modul import *
>
> In the case of the first one, it's enough if the object 'locals' implements
> the necessary protocols. The second one only works if the object 'locals'
> is a type or subtype of dictionary.
>
> The problem lies in Python-2.5/Python/ceval.c:
>
>             static int
>             import_all_from(PyObject *locals, PyObject *v)
>             {
>                ...
>       4046             value = PyObject_GetAttr(v, name);
>       4047             if (value == NULL)
>       4048                err = -1;
>       4049             else
> >>>   4050                err = PyDict_SetItem(locals, name, value);
>       4051             Py_DECREF(name);
>                ...
>             }
>
> Changing PyDict_SetItem in line 4050 with PyObject_SetAttr could fix it.
>
>
> Best Regards,
>      Daniel
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From martin at v.loewis.de  Wed Nov 29 22:05:55 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Wed, 29 Nov 2006 22:05:55 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>	<200611290053.17199.anthony@interlink.com.au>	<CD85E944-6324-4E2C-8D45-7A0367A86160@python.org>
	<ca471dc20611281305r58d5b6d5tfa5c76c53221f03b@mail.gmail.com>
Message-ID: <456DF633.5070103@v.loewis.de>

Guido van Rossum schrieb:
> I wonder if would help if we were to add a vendor-packages directory
> where distros can put their own selection of 3rd party stuff they
> depend on, to be searched before site-packages, and a command-line
> switch that ignores site-package but still searches vendor-package.
> (-S would almost do it but probably suppresses too  much.)

Patch #1298835 implements such a vendor-packages directory. I have
reopened the patch to reconsider it. I take your message as a +1
for that feature.

Regards,
Martin

From arigo at tunes.org  Wed Nov 29 23:10:11 2006
From: arigo at tunes.org (Armin Rigo)
Date: Wed, 29 Nov 2006 23:10:11 +0100
Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode
In-Reply-To: <ca471dc20611290739q4891553epb09e37cae428e65@mail.gmail.com>
References: <20061129090611.GA19856@bug.science-computing.de>
	<ca471dc20611290739q4891553epb09e37cae428e65@mail.gmail.com>
Message-ID: <20061129221011.GA28156@code0.codespeak.net>

Hi,

On Wed, Nov 29, 2006 at 07:39:25AM -0800, Guido van Rossum wrote:
> This seems a bug. In revision 36714 by Raymond Hettinger, the
> restriction that locals be a dict was relaxed to allow any mapping.

Mea culpa, I thought I reviewed this patch at the time.

Fixed in r52862-52863.


A bientot,

Armin

From barry at python.org  Thu Nov 30 00:49:52 2006
From: barry at python.org (Barry Warsaw)
Date: Wed, 29 Nov 2006 18:49:52 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>
References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>
Message-ID: <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 29, 2006, at 5:18 AM, glyph at divmod.com wrote:

> Yes, let's do that, please.  I've long been annoyed that site.py  
> sets up a local user installation directory, a very useful feature,  
> but _only_ on OS X.  I've long since promoted my personal hack to  
> add a local user installation directory into a public project --  
> divmod's "Combinator" -- but it would definitely be preferable for  
> Python to do something sane by default (and have setuptools et. al.  
> support it).
>
> I'd suggest using "~/.local/lib/pythonX.X/site-packages" for the  
> "official" UNIX installation location, since it's what we're  
> already using, and ~/.local seems like a convention being slowly  
> adopted by GNOME and the like.  I don't know the cultural  
> equivalent in Windows - "%USERPROFILE%\Application Data\PythonXX"  
> maybe?
>
> It would be nice if site.py would do this in the same place as it  
> sets up the "darwin"-specific path, and to set that path as a  
> module global, so packaging tools could use "site.userinstdir" or  
> something.  Right now, if it's present, it's just some random entry  
> on sys.path.

+1 from me also for the concept.  I'm not sure I like ~/.local though  
- -- it seems counter to the app-specific dot-file approach old  
schoolers like me are used to.  OTOH, if that's a convention being  
promoted by GNOME and other frameworks, then I don't have too much  
objection.

I also think that setuptools has the potential to be a big  
improvement here because it's much easier to install and use egg  
files than it is to get distutils to DTRT with setup.py.  (I still  
detest the command name 'easy_install' but hey that's still fixable  
right? :).  What might be nice would be to build a little more  
infrastructure into Python to support eggs, by say adding a default  
PEP 302 style importer that knows how to search for eggs in  
'nests' (a directory containing a bunch of eggs).

What if then that importer were general enough, or had a subclass  
that implemented a policy for applications where <prefix>/lib/ 
pythonX.X/app-packages/<application> became a nest directory.  All my  
app would have to do would be to drop an instance of one of those in  
the right place on sys.path and Python would pick up all the eggs in  
my app-package directory.  Further, easy_install could then grow an -- 
install-app switch or somesuch that would install the egg in the app- 
package directory.

I haven't really thought this through so maybe it's a stupid idea,  
but ISTM that would make management, installation, and use in an  
application about as simple as possible.  (Oh yeah, add an -- 
uninstall switch too :).

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRW4cpnEjvBPtnXfVAQK/7wP/fS/MnVm6Msq6kB3qJce5BOK4NFo0ewGG
uephuUfux+AWKMhl6KIIe7xeT6yO4yS/U/DF0sZ35JoOK8ebyH0JO/pup+lCfA3r
ODQL45s+G1yycZDjUh3/a9+RakdhpfBRvjU3V/IFH7ayiM9PIHxKjTIzjXo3m1Pq
1hxb5BHS/8I=
=kPE7
-----END PGP SIGNATURE-----

From greg.ewing at canterbury.ac.nz  Thu Nov 30 01:34:03 2006
From: greg.ewing at canterbury.ac.nz (Greg Ewing)
Date: Thu, 30 Nov 2006 13:34:03 +1300
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org>
References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>
	<78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org>
Message-ID: <456E26FB.2020502@canterbury.ac.nz>

Barry Warsaw wrote:
> I'm not sure I like ~/.local though  
> - -- it seems counter to the app-specific dot-file approach old  
> schoolers like me are used to.

Problems with that are starting to show, though.
There's a particular Unix account that I've had for
quite a number of years, accumulating much stuff.
Nowadays when I do ls -a ~, I get a directory
listing several screens long...

The whole concept of "hidden" files seems ill-
considered to me, anyway. It's too easy to forget
that they're there. Putting infrequently-referenced
stuff in a non-hidden location such as ~/local
seems just as good and less magical to me.

--
Greg

From python at rcn.com  Thu Nov 30 03:00:50 2006
From: python at rcn.com (python at rcn.com)
Date: Wed, 29 Nov 2006 21:00:50 -0500 (EST)
Subject: [Python-Dev] Objecttype of 'locals' argument in PyEval_EvalCode
Message-ID: <20061129210050.AOV97481@ms09.lnh.mail.rcn.net>


[Guido van Rossum]
> This seems a bug. In revision 36714 by Raymond Hettinger, 
> the restriction that locals be a dict was relaxed to allow
> any mapping.

[Armin Rigo]
> Mea culpa, I thought I reviewed this patch at the time.
> Fixed in r52862-52863.

Armin, thanks for the check-ins.  Daniel, thanks for finding one of the cases I missed. Will load a unittest for this one when I get a chance.


Raymond

From robinbryce at gmail.com  Thu Nov 30 03:01:56 2006
From: robinbryce at gmail.com (Robin Bryce)
Date: Thu, 30 Nov 2006 02:01:56 +0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <456C7D20.1080604@v.loewis.de>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<456AEA45.7060209@suse.cz>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<456C7D20.1080604@v.loewis.de>
Message-ID: <bcf87d920611291801i7a07b5aayb113ad195b08cc12@mail.gmail.com>

On 28/11/06, "Martin v. L?wis" <martin at v.loewis.de> wrote:
> I personally agree that "Linux standards" should specify a standard
> layout for a Python installation, and that it should be the one that
> "make install" generates (perhaps after "make install" is adjusted).
> Whether or not it is the *LSB* that needs to specify that, I don't
> know, because the LSB does not specify a file system layout. Instead,
> it incorporates the FHS - which might be the right place to define
> the layout of a Python installation. For the LSB, it's more import
> that "import httplib" gives you something working, no matter where
> httplib.py comes from (or whether it comes from httplib.py at all).

Yes, especially with the regard to the level you pitch for LSB. I
would go as far as to say that if this "contract in spirit" is broken
by vendor repackaging they should:
  * Call the binaries something else because it is NOT python any more.
  * Setup the installation layout so that it does NOT conflict or
overlap with the standard layout.
  * Call the whole package something else.

But I can't see that happening.

Is it a bad idea to suggest that: Python grows a vendor_variant
attribute somewhere in the standard lib; That its content is
completely dictated by a new ./configure argument which is the empty
string by default; And, request that it is left empty by re-packagers
if the installation is 'reasonably standard' ?

I would strongly prefer _not_ write code that is conditional on such
an attribute. However if there was a clear way for a vendor to
communicate "This is not a standard python runtime" to the python run
time, early failure (in the application) with informative error
messages becomes much more viable.

Eg sys.vendor_variant would be orthogonal to sys.version and sys.version_info

Given:
python -c "import sys; print sys.version"
GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5)

A regex on sys.version does not seem like a good way to get positive
confirmation I'm using the "Canonical" variant (pun intended)

python -c "from distutils.util import get_platform; print get_platform()"
Tells me nothing about the vendor of my linux distribution. Except,
ironically, when it says ImportError

Cheers,
Robin

From glyph at divmod.com  Thu Nov 30 04:20:36 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Nov 2006 03:20:36 -0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>

On 29 Nov, 11:49 pm, barry at python.org wrote:

>On Nov 29, 2006, at 5:18 AM, glyph at divmod.com wrote:

>> I'd suggest using "~/.local/lib/pythonX.X/site-packages" for the
>> "official" UNIX installation location, ...

>+1 from me also for the concept.  I'm not sure I like ~/.local though
>- -- it seems counter to the app-specific dot-file approach old
>schoolers like me are used to.  OTOH, if that's a convention being
>promoted by GNOME and other frameworks, then I don't have too much
>objection.

Thanks.  I just had a look at the code in Combinator which sets this up and it turns out it's horribly inconsistent and buggy.  It doesn't really work on any platform other than Linux.  I'll try to clean it up in the next few days so it can serve as an example.

GNOME et. al. aren't promoting the concept too hard.  It's just the first convention I came across.  (Pardon the lack of references here, but it's very hard to google for "~/.local" - I just know that I was looking for a convention when I wrote combinator, and this is the one I found.)

The major advantage ~/.local has for *nix systems is the ability to have a parallel *bin* directory, which provides the user one location to set their $PATH to, so that installed scripts work as expected, rather than having to edit a bunch of .foorc files to add to your environment with each additional package.  After all, what's the point of a per-user "install" if the software isn't actually installed in any meaningful way, and you have to manually edit your shell startup scripts, log out and log in again anyway?  Another nice feature there is that it uses a pre-existing layout convention (bin lib share etc ...) rather than attempting to build a new one, so the only thing that has to change about the package installation is the root.

Finally, I know there are quite a few Python developers out there already using Combinator, so at least there it's an established convention :).

>I also think that setuptools has the potential to be a big
>improvement here because it's much easier to install and use egg
>files than it is to get distutils to DTRT with setup.py.  (I still
>detest the command name 'easy_install' but hey that's still fixable
>right? :).  What might be nice would be to build a little more
>infrastructure into Python to support eggs, by say adding a default
>PEP 302 style importer that knows how to search for eggs in
>'nests' (a directory containing a bunch of eggs).

One of the things that combinator hacks is where distutils thinks it should install to - when *I* type "python setup.py install" nothing tries to insert itself into system directories (those are for Ubuntu, not me) - ~/.local is the *default* install location.  I haven't managed to make this feature work with eggs yet, but I haven't done a lot of work with setuptools.

On the "easy_install" naming front, how about "layegg"?

>What if then that importer were general enough (...)

These all sound like interesting ideas, but they're starting to get pretty far afield - I wish I had more time to share ideas about packaging, but I know too well that I'm not going to be able to back them up with any implementation effort.

I'd really like Python to use the ~/.local/bin / ~/.local/lib convention for installing packages, though.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/82573734/attachment.html 

From glyph at divmod.com  Thu Nov 30 04:32:35 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Nov 2006 03:32:35 -0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <20061130033235.11053.1425332311.divmod.xquotient.907@joule.divmod.com>

On 12:34 am, greg.ewing at canterbury.ac.nz wrote:

>The whole concept of "hidden" files seems ill-
>considered to me, anyway. It's too easy to forget
>that they're there. Putting infrequently-referenced
>stuff in a non-hidden location such as ~/local
>seems just as good and less magical to me.

Something like "~/.local" is an implementation detail, not something that should be exposed to non-savvy users.  It's easy enough for an expert to "show" it if they want to - "ln -s .local local" - but impossible for someone more naive to hide if they don't understand what it is or what it's for.  (And if they try, by clicking a checkbox in Nautilus or somesuch, *all* their installed software breaks.)  This approach doesn't really work unless you have good support from the OS, so it can warn you you're about to do something crazy.

UI designers tend to get adamant about this sort of thing, but I'll admit they go both ways, some saying that everything should be exposed to the user, some saying that all details should be hidden by default.  Still, in the more recent UNIX desktops, the "let's hide the things that the user shouldn't see and just work really hard to make them work right all the time" camp seems to be winning.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/6118882b/attachment.htm 

From fdrake at acm.org  Thu Nov 30 05:11:58 2006
From: fdrake at acm.org (Fred L. Drake, Jr.)
Date: Wed, 29 Nov 2006 23:11:58 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
Message-ID: <200611292311.59096.fdrake@acm.org>

On Wednesday 29 November 2006 22:20, glyph at divmod.com wrote:
 > GNOME et. al. aren't promoting the concept too hard.  It's just the first
 > convention I came across.  (Pardon the lack of references here, but it's
 > very hard to google for "~/.local" - I just know that I was looking for a
 > convention when I wrote combinator, and this is the one I found.)

~/.local/ is described in the "XDG Base Directory Specification":

    http://standards.freedesktop.org/basedir-spec/latest/

 > On the "easy_install" naming front, how about "layegg"?

Actually, why not just "egg"?  That's parallel to "rpm" at least, and there 
isn't such a command installed on my Ubuntu box already.  (Using synaptic to 
search for "egg" resulted in little that actually had "egg" in the name or 
short description; there was wnn7egg (a Wnn7 input method), but that's really 
it.)


  -Fred

-- 
Fred L. Drake, Jr.   <fdrake at acm.org>

From pje at telecommunity.com  Thu Nov 30 05:36:19 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 29 Nov 2006 23:36:19 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule
	.divmod.com>
Message-ID: <5.1.1.6.0.20061129233347.03a1d458@sparrow.telecommunity.com>

At 03:20 AM 11/30/2006 +0000, glyph at divmod.com wrote:
>One of the things that combinator hacks is where distutils thinks it 
>should install to - when *I* type "python setup.py install" nothing tries 
>to insert itself into system directories (those are for Ubuntu, not me) - 
>~/.local is the *default* install location.  I haven't managed to make 
>this feature work with eggs yet, but I haven't done a lot of work with 
>setuptools.

easy_install uses the standard distutils configuration system, which means 
that you can do e.g.

    [install]
    prefix = ~/.local

in ./setup.cfg, ~/.pydistutils.cfg, or 
/usr/lib/python2.x/distutils/distutils.cfg to set the default installation 
prefix.  Setuptools (and distutils!) will then install libraries to 
~/.local/lib/python2.x/site-packages and scripts to ~/.local/bin.


From pje at telecommunity.com  Thu Nov 30 05:45:06 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Wed, 29 Nov 2006 23:45:06 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org>
References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>
	<20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>
Message-ID: <5.1.1.6.0.20061129233752.042aabe8@sparrow.telecommunity.com>

At 06:49 PM 11/29/2006 -0500, Barry Warsaw wrote:
>What might be nice would be to build a little more
>infrastructure into Python to support eggs, by say adding a default
>PEP 302 style importer that knows how to search for eggs in
>'nests' (a directory containing a bunch of eggs).

If you have setuptools generate your scripts, the eggs are searched for and 
added to sys.path automatically, with no need for a separate importer.  If 
you write standalone scripts (not using "setup.py develop" or "setup.py 
install"), you can use pkg_resources.require() to find eggs and add them to 
sys.path manually.  If you want eggs available when you start Python, 
easy_install puts them on sys.path using .pth files by default.

So, I'm not clear on what use case you have in mind for this importer, or 
how you think it would work.  (Any .egg file in a sys.path directory is 
already automatically discoverable by the means described above.)


>What if then that importer were general enough, or had a subclass
>that implemented a policy for applications where <prefix>/lib/
>pythonX.X/app-packages/<application> became a nest directory.

Simply installing your scripts to the same directory as the eggs they 
require, is sufficient to ensure this.  Also, since eggs are versioned, 
nothing stops you from having one giant systemwide egg 
directory.  Setuptools-generated scripts automatically adjust their 
sys.path to include the specific eggs they need - and "need" can be 
specified to an exact version if desired (e.g. for system admin tools).


>I haven't really thought this through so maybe it's a stupid idea,
>but ISTM that would make management, installation, and use in an
>application about as simple as possible.  (Oh yeah, add an --
>uninstall switch too :).

Yeah, that's targeted for the "nest" package management tool, which I may 
have some time to work on someday, in my copious free time.  :)  In the 
meantime, 'easy_install -Nm eggname; rm -rf /path/to/the.egg' takes care of 
everything but the scripts.


From glyph at divmod.com  Thu Nov 30 06:06:18 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Nov 2006 05:06:18 -0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <20061130050618.11053.1949245277.divmod.xquotient.913@joule.divmod.com>

On 04:11 am, fdrake at acm.org wrote:
>On Wednesday 29 November 2006 22:20, glyph at divmod.com wrote:
> > GNOME et. al. aren't promoting the concept too hard.  It's just the first
> > convention I came across.  (Pardon the lack of references here, but it's
> > very hard to google for "~/.local" - I just know that I was looking for a
> > convention when I wrote combinator, and this is the one I found.)
>
>~/.local/ is described in the "XDG Base Directory Specification":
>
>    http://standards.freedesktop.org/basedir-spec/latest/

Thanks for digging that up!  Not a whole lot of meat there, but at least it gives me some env vars to set / check...

> > On the "easy_install" naming front, how about "layegg"?
>
>Actually, why not just "egg"?

That works for me.  I assumed there was some other reason the obvious answer hadn't been chosen :).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/5ecc1583/attachment.htm 

From barry at python.org  Thu Nov 30 06:09:30 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 30 Nov 2006 00:09:30 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
Message-ID: <CFEF70DE-CCF1-4722-8158-F942241A4D8A@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 29, 2006, at 10:20 PM, glyph at divmod.com wrote:

> Another nice feature there is that it uses a pre-existing layout  
> convention (bin lib share etc ...) rather than attempting to build  
> a new one, so the only thing that has to change about the package  
> installation is the root.

That's an excellent point, because in configure-speak I guess you  
could just use --prefix=<home>/.local and everything would lay out  
correctly.  (I guess that's the whole point, eh? :)

> One of the things that combinator hacks is where distutils thinks  
> it should install to - when *I* type "python setup.py install"  
> nothing tries to insert itself into system directories (those are  
> for Ubuntu, not me) - ~/.local is the *default* install location.   
> I haven't managed to make this feature work with eggs yet, but I  
> haven't done a lot of work with setuptools.

That's really nice.  So if I "sudo python setup.py install" it'll see  
uid 0 and install in the system location?

> On the "easy_install" naming front, how about "layegg"?

I think I once proposed "hatch" but that may not be quite the right  
word (where's Ken M when you need him? :).

> >What if then that importer were general enough (...)
>
> These all sound like interesting ideas, but they're starting to get  
> pretty far afield - I wish I had more time to share ideas about  
> packaging, but I know too well that I'm not going to be able to  
> back them up with any implementation effort.

Yeah, same here, so I'll shut up now.

> I'd really like Python to use the ~/.local/bin / ~/.local/lib  
> convention for installing packages, though.

I'm sold.
- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRW5ninEjvBPtnXfVAQKZBgP+MC1p3ipJbJn8ayhYyO73hdeWHpeHWd82
F4pFwkAuiXMWZ9/le1XW61+ODfSSti0RbBEiJeuul5dHP7+DlhXHyXrCf6Zzab4e
PTerySTgc8AtI8L2VZzAaVU9PlzmKw0dp4s2pigNbGb3FRbH/m/ZwhSSYfeQTA3U
gdA5YQq7CD0=
=CJ9T
-----END PGP SIGNATURE-----

From glyph at divmod.com  Thu Nov 30 06:10:46 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Nov 2006 05:10:46 -0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <20061130051046.11053.1680534272.divmod.xquotient.922@joule.divmod.com>

On 04:36 am, pje at telecommunity.com wrote:

>easy_install uses the standard distutils configuration system, which means 
>that you can do e.g.

Hmm.  I thought I knew quite a lot about distutils, but this particular nugget had evaded me.  Thanks!  I see that it's mentioned in the documentation, but I never thought to look in that section.  I have an aversion to .ini files; I tend to assume there's always an equivalent Python expression, and it's better.  Is there an equivalent Python API in this case?

I don't know if this is a personal quirk of mine, or a reinforcement of Talin's point about the audience for documentation documentation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/a3da98c6/attachment.html 

From barry at python.org  Thu Nov 30 06:17:58 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 30 Nov 2006 00:17:58 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <5.1.1.6.0.20061129233752.042aabe8@sparrow.telecommunity.com>
References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>
	<20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>
	<5.1.1.6.0.20061129233752.042aabe8@sparrow.telecommunity.com>
Message-ID: <04EB4315-008A-460F-8DE2-D322634ED9BA@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 29, 2006, at 11:45 PM, Phillip J. Eby wrote:

[Phillip describes a bunch of things I didn't know about setuptools]

As is often the case, maybe everything I want is already there and  
I've just been looking in the wrong places. :)  Thanks!  I'll read up  
on that stuff.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRW5phnEjvBPtnXfVAQKwLgP/doK7aF5zGknK4JCv+rjO4xXKWRwjB0Vk
B08Ee2HlSTcqSe8YIqMOSCRa8LcW86hEFipJmIi8vzcPv0Tr6y+i6yMTq0zhYeyh
lvc7E7wdMY+U78/+ffeDLBNESXkZRzaiv0aH4ZkBf3xOebj58vCNBHlmzfT0WeFj
EMnJut6jOnM=
=mlIW
-----END PGP SIGNATURE-----

From pje at telecommunity.com  Thu Nov 30 06:34:33 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 30 Nov 2006 00:34:33 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061130051046.11053.1680534272.divmod.xquotient.922@joule
	.divmod.com>
Message-ID: <5.1.1.6.0.20061130003332.04636428@sparrow.telecommunity.com>

At 05:10 AM 11/30/2006 +0000, glyph at divmod.com wrote:
>On 04:36 am, pje at telecommunity.com wrote:
>
> >easy_install uses the standard distutils configuration system, which means
> >that you can do e.g.
>
>Hmm.  I thought I knew quite a lot about distutils, but this particular 
>nugget had evaded me.  Thanks!  I see that it's mentioned in the 
>documentation, but I never thought to look in that section.  I have an 
>aversion to .ini files; I tend to assume there's always an equivalent 
>Python expression, and it's better.  Is there an equivalent Python API in 
>this case?

Well, in a setup.py there's an options or some such that can be used to 
provide effective command-line option overrides in-line, but that doesn't 
help for systemwide default configurations, like the files I 
mentioned.  It's effectively only a substitute for setup.cfg.


From martin at v.loewis.de  Thu Nov 30 07:12:33 2006
From: martin at v.loewis.de (=?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=)
Date: Thu, 30 Nov 2006 07:12:33 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <bcf87d920611291801i7a07b5aayb113ad195b08cc12@mail.gmail.com>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>	
	<456AEA45.7060209@suse.cz>	
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>	
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>	
	<456C7D20.1080604@v.loewis.de>
	<bcf87d920611291801i7a07b5aayb113ad195b08cc12@mail.gmail.com>
Message-ID: <456E7651.7070100@v.loewis.de>

Robin Bryce schrieb:
> Yes, especially with the regard to the level you pitch for LSB. I
> would go as far as to say that if this "contract in spirit" is broken
> by vendor repackaging they should:
>  * Call the binaries something else because it is NOT python any more.
>  * Setup the installation layout so that it does NOT conflict or
> overlap with the standard layout.
>  * Call the whole package something else.

I think that would be counter-productive. If applied in a strict
sense, you couldn't call it Python anymore if it isn't in /usr/local.
I see no point to that.

It shouldn't be called Python anymore if it doesn't implement
the Python language specification. No vendor is modifying it
in such a way that

print "Hello"

stops working.

> Is it a bad idea to suggest that: Python grows a vendor_variant
> attribute somewhere in the standard lib; That its content is
> completely dictated by a new ./configure argument which is the empty
> string by default; And, request that it is left empty by re-packagers
> if the installation is 'reasonably standard' ?

I'm not sure in what applications that would be useful.

> I would strongly prefer _not_ write code that is conditional on such
> an attribute. However if there was a clear way for a vendor to
> communicate "This is not a standard python runtime" to the python run
> time, early failure (in the application) with informative error
> messages becomes much more viable.

Again: none of the vendors modifies Python in a way that what
you get is "not a standard Python runtime". They *all*
are "standard Python runtimes".

Regards,
Martin

From talin at acm.org  Thu Nov 30 15:40:55 2006
From: talin at acm.org (Talin)
Date: Thu, 30 Nov 2006 06:40:55 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <456E26FB.2020502@canterbury.ac.nz>
References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>	<78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org>
	<456E26FB.2020502@canterbury.ac.nz>
Message-ID: <456EED77.5060009@acm.org>

Greg Ewing wrote:
> Barry Warsaw wrote:
>> I'm not sure I like ~/.local though  
>> - -- it seems counter to the app-specific dot-file approach old  
>> schoolers like me are used to.
> 
> Problems with that are starting to show, though.
> There's a particular Unix account that I've had for
> quite a number of years, accumulating much stuff.
> Nowadays when I do ls -a ~, I get a directory
> listing several screens long...
> 
> The whole concept of "hidden" files seems ill-
> considered to me, anyway. It's too easy to forget
> that they're there. Putting infrequently-referenced
> stuff in a non-hidden location such as ~/local
> seems just as good and less magical to me.

On OS X, you of course have ~/Library. I suppose the Linux equivalent 
would be something like ~/lib.

Maybe this is something that we should be asking the LSB folks for 
advice on?

> 
> --
> Greg
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/talin%40acm.org
> 

From talin at acm.org  Thu Nov 30 15:49:16 2006
From: talin at acm.org (Talin)
Date: Thu, 30 Nov 2006 06:49:16 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <CFEF70DE-CCF1-4722-8158-F942241A4D8A@python.org>
References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
	<CFEF70DE-CCF1-4722-8158-F942241A4D8A@python.org>
Message-ID: <456EEF6C.40409@acm.org>

Barry Warsaw wrote:
>> On the "easy_install" naming front, how about "layegg"?
> 
> I think I once proposed "hatch" but that may not be quite the right  
> word (where's Ken M when you need him? :).

I really don't like all these "cute" names, simply because they are 
obscure. Names that only make sense once you've gotten the joke may be 
self-gratifying but not good HCI.

How about:

    python -M install

Or maybe we could even lobby to get:

    python --install

as a synonym of the above?

-- Talin

From ronaldoussoren at mac.com  Thu Nov 30 15:55:02 2006
From: ronaldoussoren at mac.com (Ronald Oussoren)
Date: Thu, 30 Nov 2006 06:55:02 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <456EEF6C.40409@acm.org>
References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
	<CFEF70DE-CCF1-4722-8158-F942241A4D8A@python.org>
	<456EEF6C.40409@acm.org>
Message-ID: <C8E25138-010F-1000-AB2C-4259FDF2BE4A-Webmail-10012@mac.com>

 
On Thursday, November 30, 2006, at 03:49PM, "Talin" <talin at acm.org> wrote:
>Barry Warsaw wrote:
>>> On the "easy_install" naming front, how about "layegg"?
>> 
>> I think I once proposed "hatch" but that may not be quite the right  
>> word (where's Ken M when you need him? :).
>
>I really don't like all these "cute" names, simply because they are 
>obscure. Names that only make sense once you've gotten the joke may be 
>self-gratifying but not good HCI.
>
>How about:
>
>    python -M install
>
>Or maybe we could even lobby to get:
>
>    python --install
>
>as a synonym of the above?


Maybe because 'install' is just one of the actions? I'd also like to see 'uninstall', 'list' and 'upgrade' actions (and have some very crude code to do this).

Ronald


From barry at python.org  Thu Nov 30 16:20:25 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 30 Nov 2006 10:20:25 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <456EED77.5060009@acm.org>
References: <20061129101826.11053.667681482.divmod.xquotient.770@joule.divmod.com>	<78EBB5BE-1F47-4BA3-B22A-1966C949E25D@python.org>
	<456E26FB.2020502@canterbury.ac.nz> <456EED77.5060009@acm.org>
Message-ID: <671B99D2-3174-4A5E-B12B-D2241EA22959@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 30, 2006, at 9:40 AM, Talin wrote:

> Greg Ewing wrote:
>> Barry Warsaw wrote:
>>> I'm not sure I like ~/.local though  - -- it seems counter to the  
>>> app-specific dot-file approach old  schoolers like me are used to.
>> Problems with that are starting to show, though.
>> There's a particular Unix account that I've had for
>> quite a number of years, accumulating much stuff.
>> Nowadays when I do ls -a ~, I get a directory
>> listing several screens long...
>> The whole concept of "hidden" files seems ill-
>> considered to me, anyway. It's too easy to forget
>> that they're there. Putting infrequently-referenced
>> stuff in a non-hidden location such as ~/local
>> seems just as good and less magical to me.
>
> On OS X, you of course have ~/Library. I suppose the Linux  
> equivalent would be something like ~/lib.

I forgot to add in my previous follow up why I'd prefer ~/.local over  
~/local.  It's a namespace thing.  Dot-files in my home directory are  
like __names__ in Python -- they don't belong to me.  Non-dot-names  
are my namespace so things like ~/local constrain what I can call my  
own files.

When I switched to OS X for most of my desktops, I had several  
collisions in this namespace.  I keep all my homedir files under  
subversion and could not check out my environment on my new Mac until  
I named a few directories (this was exacerbated by the case- 
insensitive file system).

I think in general OS X has less philosophical problem with colliding  
in the non-dot namespace because most OS X users don't ever /see/  
their home directory.  They see ~/Desktop.  Maybe that's what all the  
kids are into these days, but I still think dot-names are better to  
use for a wider acceptance.

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRW72vXEjvBPtnXfVAQJUmAP8DOQkDJm35xfpSPmvFPXYNZYRhYk8gdSk
yMisPq100d5c0lGvW/LjDyLoyi96vd0IQu/WfSgzbe9MBvJ6egP2R0U9hgwytxo5
VcI7jiqel8KFRqgM+4Xqau7MGRiIBGsNX/V5tzGPBA5QP4eSSEFXh/2i9l7ciWJE
bN/byz5zlXo=
=8CkG
-----END PGP SIGNATURE-----

From barry at python.org  Thu Nov 30 16:24:02 2006
From: barry at python.org (Barry Warsaw)
Date: Thu, 30 Nov 2006 10:24:02 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <456EEF6C.40409@acm.org>
References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
	<CFEF70DE-CCF1-4722-8158-F942241A4D8A@python.org>
	<456EEF6C.40409@acm.org>
Message-ID: <715A197A-4B46-40BD-AB8D-706C810D81C1@python.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Nov 30, 2006, at 9:49 AM, Talin wrote:

> I really don't like all these "cute" names, simply because they are  
> obscure. Names that only make sense once you've gotten the joke may  
> be self-gratifying but not good HCI.

Warsaw's Fifth Law :)

> How about:
>
>    python -M install
>
> Or maybe we could even lobby to get:
>
>    python --install
>
> as a synonym of the above?

As Ronald points out, installing is only one action, and then you  
have to handle all of its options too.  Maybe that means

python -M install --install-dir foo --other-setuptools-options

would work, but I don't think just bare --install does.  I'm also not  
sure "python -M install" is a big improvement over "egg" or whatever  
("egg" actually isn't bad).

- -Barry

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Darwin)

iQCVAwUBRW73knEjvBPtnXfVAQKXHAQAgpcTBkhkN12H/JNOT2NJFMd+jilYYCnQ
DmcwEnKeBEM0VoLelXKRs7xAj+ULownvL3Lv4bBgpXw69lH5ZCMcWLme2lnkD3Ko
B0KSUSRS3DjApy4VTSBHW0M78K2n1yJ0XfTp2ceWfk42O1C6Qi6nnFkh2VT617tI
hXKYWAzJaFA=
=QY+p
-----END PGP SIGNATURE-----

From janssen at parc.com  Thu Nov 30 18:37:43 2006
From: janssen at parc.com (Bill Janssen)
Date: Thu, 30 Nov 2006 09:37:43 PST
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <715A197A-4B46-40BD-AB8D-706C810D81C1@python.org> 
References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
	<CFEF70DE-CCF1-4722-8158-F942241A4D8A@python.org>
	<456EEF6C.40409@acm.org>
	<715A197A-4B46-40BD-AB8D-706C810D81C1@python.org>
Message-ID: <06Nov30.093750pst."58648"@synergy1.parc.xerox.com>

Perhaps "pyinstall"?

Bill

> On Nov 30, 2006, at 9:49 AM, Talin wrote:
> 
> > I really don't like all these "cute" names, simply because they are  
> > obscure. Names that only make sense once you've gotten the joke may  
> > be self-gratifying but not good HCI.
> 
> Warsaw's Fifth Law :)
> 
> > How about:
> >
> >    python -M install
> >
> > Or maybe we could even lobby to get:
> >
> >    python --install
> >
> > as a synonym of the above?
> 
> As Ronald points out, installing is only one action, and then you  
> have to handle all of its options too.  Maybe that means
> 
> python -M install --install-dir foo --other-setuptools-options
> 
> would work, but I don't think just bare --install does.  I'm also not  
> sure "python -M install" is a big improvement over "egg" or whatever  
> ("egg" actually isn't bad).
> 
> - -Barry


From guido at python.org  Thu Nov 30 18:49:25 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Nov 2006 09:49:25 -0800
Subject: [Python-Dev] Small tweak to tokenize.py?
Message-ID: <ca471dc20611300949g512902bar72ebe13565328aca@mail.gmail.com>

I've got a small tweak to tokenize.py that I'd like to run by folks here.

I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
and my approach is to build a full parse tree with annotations that
show where the whitespace and comments go. I use the tokenize module
to scan the input. This is nearly perfect (I can render code from the
parse tree and it will be an exact match of the input) except for
continuation lines -- while the tokenize gives me pseudo-tokens for
comments and "ignored" newlines, it doesn't give me the backslashes at
all (while it does give me the newline following the backslash).

It would be trivial to add another yield to tokenize.py when the
backslah is detected:

--- tokenize.py	(revision 52865)
+++ tokenize.py	(working copy)
@@ -370,6 +370,8 @@
                 elif initial in namechars:                 # ordinary name
                     yield (NAME, token, spos, epos, line)
                 elif initial == '\\':                      # continued stmt
+                    # This yield is new; needed for better idempotency:
+                    yield (NL, initial, spos, (spos[0], spos[1]+1), line)
                     continued = 1
                 else:
                     if initial in '([{': parenlev = parenlev + 1

(Though I think that it should probably yield a single NL pseudo-token
whose value is a backslash followed by a newline; or perhaps it should
yield the backslash as a comment token, or as a new token. Thoughts?)

This wouldn't be 100% backwards compatible, so I'm not dreaming of
adding this to 2.5.1, but what about 2.6?

(There's another issue with tokenize.py too -- when you use it to
parse Python-like source code containing non-Python operators, e.g.
'?', it does something bogus.)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From glyph at divmod.com  Thu Nov 30 19:02:47 2006
From: glyph at divmod.com (glyph at divmod.com)
Date: Thu, 30 Nov 2006 18:02:47 -0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
Message-ID: <20061130180247.11053.492404401.divmod.xquotient.954@joule.divmod.com>

On 05:37 pm, janssen at parc.com wrote:
>Perhaps "pyinstall"?

Keep in mind that Python packages will still generally be *system*-installed with other tools, like dpkg (or apt) and rpm, on systems which have them.  The name of the packaging system we're talking about is called either "eggs" or "setuptools" depending on the context.  "pyinstall" invites confusion with "the Python installer", which is a different program, used to install Python itself on Windows.

It's just a brand.  If users can understand that "Excel" means "Spreadsheet", "Outlook" means "E-Mail", and "GIMP" means "Image Editor", then I think we should give them some credit on being able to figure out what the installer program is called.

(I don't really care that much in this particular case, but this was one of my pet peeves with GNOME a while back.  There was a brief change to the names of everything in the menus to remove all brand-names: "Firefox" became "Web Browser", "Evolution" became "E-Mail", "Rhythmbox" became "Music Player".  I remember looking at my applications menu and wondering which of the 3 "music players" that I had installed the menu would run.  Thankfully this nonsense stopped and they compromised on names like "Firefox Web Browser" and "GIMP Image Editor".)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20061130/7e973951/attachment.html 

From pje at telecommunity.com  Thu Nov 30 19:11:10 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 30 Nov 2006 13:11:10 -0500
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061130180247.11053.492404401.divmod.xquotient.954@joule.
	divmod.com>
Message-ID: <5.1.1.6.0.20061130130533.02880c78@sparrow.telecommunity.com>

At 06:02 PM 11/30/2006 +0000, glyph at divmod.com wrote:
>On 05:37 pm, janssen at parc.com wrote:
> >Perhaps "pyinstall"?
>
>Keep in mind that Python packages will still generally be 
>*system*-installed with other tools, like dpkg (or apt) and rpm, on 
>systems which have them.  The name of the packaging system we're talking 
>about is called either "eggs" or "setuptools" depending on the context.

Just as an FYI, the (planned) name of the packaging program for setuptools 
is "nest".  It doesn't exist yet, however, except for a whole lot of design 
notes in my outlining program.  You'll be able to use commands like "nest 
list" to show installed projects, "nest source" to fetch a project's 
source, "nest rm" or "nest uninstall" to uninstall, etc.  It's all 100% 
vaporware at the moment, but that's the plan.

I actually looked at other system package managers written in Python (i.e. 
yum and smart) to use as a possible base for implementing "nest", but 
unfortunately these are all GPL'd and thus not compatible with the 
setuptools or Python licenses, so I didn't actually get very far in my 
evaluation.


From pje at telecommunity.com  Thu Nov 30 19:22:57 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 30 Nov 2006 13:22:57 -0500
Subject: [Python-Dev] Small tweak to tokenize.py?
In-Reply-To: <ca471dc20611300949g512902bar72ebe13565328aca@mail.gmail.co
 m>
Message-ID: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>

At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
>I've got a small tweak to tokenize.py that I'd like to run by folks here.
>
>I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
>and my approach is to build a full parse tree with annotations that
>show where the whitespace and comments go. I use the tokenize module
>to scan the input. This is nearly perfect (I can render code from the
>parse tree and it will be an exact match of the input) except for
>continuation lines -- while the tokenize gives me pseudo-tokens for
>comments and "ignored" newlines, it doesn't give me the backslashes at
>all (while it does give me the newline following the backslash).

The following routine will render a token stream, and it automatically 
restores the missing \'s.  I don't know if it'll work with your patch, but 
perhaps you could use it instead of changing tokenize.  For the 
documentation and examples, see:

http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text


def detokenize(tokens, indent=0):
     """Convert `tokens` iterable back to a string."""
     out = []; add = out.append
     lr,lc,last = 0,0,''
     baseindent = None
     for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
         # Insert trailing line continuation and blanks for skipped lines
         lr = lr or sr   # first line of input is first line of output
         if sr>lr:
             if last:
                 if len(last)>lc:
                     add(last[lc:])
                 lr+=1
             if sr>lr:
                 add(' '*indent + '\\\n'*(sr-lr))    # blank continuation lines
             lc = 0

         # Re-indent first token on line
         if lc==0:
             if tok==INDENT:
                 continue  # we want to dedent first actual token
             else:
                 curindent = len(line[:sc].expandtabs())
                 if baseindent is None and tok not in WHITESPACE:
                     baseindent = curindent
                 elif baseindent is not None and curindent>=baseindent:
                     add(' ' * (curindent-baseindent))
                 if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
                     add(' ' * indent)

         # Not at start of line, handle intraline whitespace by retaining it
         elif sc>lc:
             add(line[lc:sc])

         if val:
             add(val)

         lr,lc,last = er,ec,line

     return ''.join(out)


From guido at python.org  Thu Nov 30 19:28:25 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Nov 2006 10:28:25 -0800
Subject: [Python-Dev] Small tweak to tokenize.py?
In-Reply-To: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>
Message-ID: <ca471dc20611301028w6d67ad03w73aaf381dbfbda1e@mail.gmail.com>

Are you opposed changing tokenize? If so, why (apart from
compatibility)? ISTM that it would be a good thing if it reported
everything except horizontal whitespace.

On 11/30/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 09:49 AM 11/30/2006 -0800, Guido van Rossum wrote:
> >I've got a small tweak to tokenize.py that I'd like to run by folks here.
> >
> >I'm working on a refactoring tool for Python 2.x-to-3.x conversion,
> >and my approach is to build a full parse tree with annotations that
> >show where the whitespace and comments go. I use the tokenize module
> >to scan the input. This is nearly perfect (I can render code from the
> >parse tree and it will be an exact match of the input) except for
> >continuation lines -- while the tokenize gives me pseudo-tokens for
> >comments and "ignored" newlines, it doesn't give me the backslashes at
> >all (while it does give me the newline following the backslash).
>
> The following routine will render a token stream, and it automatically
> restores the missing \'s.  I don't know if it'll work with your patch, but
> perhaps you could use it instead of changing tokenize.  For the
> documentation and examples, see:
>
> http://peak.telecommunity.com/DevCenter/scale.dsl#converting-tokens-back-to-text
>
>
> def detokenize(tokens, indent=0):
>      """Convert `tokens` iterable back to a string."""
>      out = []; add = out.append
>      lr,lc,last = 0,0,''
>      baseindent = None
>      for tok, val, (sr,sc), (er,ec), line in flatten_stmt(tokens):
>          # Insert trailing line continuation and blanks for skipped lines
>          lr = lr or sr   # first line of input is first line of output
>          if sr>lr:
>              if last:
>                  if len(last)>lc:
>                      add(last[lc:])
>                  lr+=1
>              if sr>lr:
>                  add(' '*indent + '\\\n'*(sr-lr))    # blank continuation lines
>              lc = 0
>
>          # Re-indent first token on line
>          if lc==0:
>              if tok==INDENT:
>                  continue  # we want to dedent first actual token
>              else:
>                  curindent = len(line[:sc].expandtabs())
>                  if baseindent is None and tok not in WHITESPACE:
>                      baseindent = curindent
>                  elif baseindent is not None and curindent>=baseindent:
>                      add(' ' * (curindent-baseindent))
>                  if indent and tok not in (DEDENT, ENDMARKER, NL, NEWLINE):
>                      add(' ' * indent)
>
>          # Not at start of line, handle intraline whitespace by retaining it
>          elif sc>lc:
>              add(line[lc:sc])
>
>          if val:
>              add(val)
>
>          lr,lc,last = er,ec,line
>
>      return ''.join(out)
>
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From fredrik at pythonware.com  Thu Nov 30 19:34:41 2006
From: fredrik at pythonware.com (Fredrik Lundh)
Date: Thu, 30 Nov 2006 19:34:41 +0100
Subject: [Python-Dev] Small tweak to tokenize.py?
In-Reply-To: <ca471dc20611301028w6d67ad03w73aaf381dbfbda1e@mail.gmail.com>
References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>
	<ca471dc20611301028w6d67ad03w73aaf381dbfbda1e@mail.gmail.com>
Message-ID: <ekn881$jon$2@sea.gmane.org>

Guido van Rossum wrote:

> Are you opposed changing tokenize? If so, why (apart from
> compatibility)? ISTM that it would be a good thing if it reported
> everything except horizontal whitespace.

it would be a good thing if it could, optionally, be made to report 
horizontal whitespace as well.

</F>


From pje at telecommunity.com  Thu Nov 30 19:55:44 2006
From: pje at telecommunity.com (Phillip J. Eby)
Date: Thu, 30 Nov 2006 13:55:44 -0500
Subject: [Python-Dev] Small tweak to tokenize.py?
In-Reply-To: <ca471dc20611301028w6d67ad03w73aaf381dbfbda1e@mail.gmail.co
 m>
References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>
	<5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>
Message-ID: <5.1.1.6.0.20061130135127.0287fa38@sparrow.telecommunity.com>

At 10:28 AM 11/30/2006 -0800, Guido van Rossum wrote:
>Are you opposed changing tokenize? If so, why (apart from
>compatibility)?

Nothing apart from compatibility.  I think you should have to explicitly 
request the new behavior(s), since tools (like detokenize) written to work 
around the old behavior might behave oddly with the change.

Mainly, though, I thought you might find the code useful, given the nature 
of your project.  (Although I suppose you've probably already written 
something similar.)


From python at rcn.com  Thu Nov 30 20:12:16 2006
From: python at rcn.com (python at rcn.com)
Date: Thu, 30 Nov 2006 14:12:16 -0500 (EST)
Subject: [Python-Dev] Small tweak to tokenize.py?
Message-ID: <20061130141216.AOY93108@ms09.lnh.mail.rcn.net>

> It would be trivial to add another yield to tokenize.py when
> the backslah is detected

+1


> I think that it should probably yield a single NL pseudo-token
> whose value is a backslash followed by a newline; or perhaps it
> should yield the backslash as a comment token, or as a new token.

The first option is likely the most compatible with existing uses of tokenize.  If a comment token were emitted, an existing colorizer or pretty-printer would markup the continuation as a comment (possibly not what the tool author intended).  If a new token were created, it might break if-elif-else chains in tools that thought they knew the universe of possible token types.


Raymond

From lists at janc.be  Thu Nov 30 21:05:49 2006
From: lists at janc.be (Jan Claeys)
Date: Thu, 30 Nov 2006 21:05:49 +0100
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061129112354.GA30665@code0.codespeak.net>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>
	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>
	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>
	<200611290053.17199.anthony@interlink.com.au>
	<20061129112354.GA30665@code0.codespeak.net>
Message-ID: <1164917149.31269.344.camel@localhost>

Op woensdag 29-11-2006 om 12:23 uur [tijdzone +0100], schreef Armin
Rigo:
> I could not agree more.  Nowadays, whenever I get an account on a new
> Linux machine, the first thing I have to do is reinstall Python
> correctly in my home dir because the system Python lacks distutils.
> Wasteful.  (There are some applications and libraries that use
> distutils at run-time to compile things, and I'm using such
> applications and libraries on a daily basis.) 

I think you should blame the sysadmins, and kick them to install python
properly for use by a developer, because every distro I know provides
distutils...   ;-)


-- 
Jan Claeys


From guido at python.org  Thu Nov 30 22:46:01 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Nov 2006 13:46:01 -0800
Subject: [Python-Dev] Small tweak to tokenize.py?
In-Reply-To: <5.1.1.6.0.20061130135127.0287fa38@sparrow.telecommunity.com>
References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>
	<5.1.1.6.0.20061130135127.0287fa38@sparrow.telecommunity.com>
Message-ID: <ca471dc20611301346j4b073589r6f3fdff6e5ec9ef9@mail.gmail.com>

On 11/30/06, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 10:28 AM 11/30/2006 -0800, Guido van Rossum wrote:
> >Are you opposed changing tokenize? If so, why (apart from
> >compatibility)?
>
> Nothing apart from compatibility.  I think you should have to explicitly
> request the new behavior(s), since tools (like detokenize) written to work
> around the old behavior might behave oddly with the change.

Can you test it with this new change (slightly different from before)?
It reports a NL pseudo-token with as its text value '\\\n' (or
'\\\r\n' if the line ends in \r\n).

@@ -370,6 +370,8 @@
                 elif initial in namechars:                 # ordinary name
                     yield (NAME, token, spos, epos, line)
                 elif initial == '\\':                      # continued stmt
+                    # This yield is new; needed for better idempotency:
+                    yield (NL, token, spos, (lnum, pos), line)
                     continued = 1
                 else:
                     if initial in '([{': parenlev = parenlev + 1

> Mainly, though, I thought you might find the code useful, given the nature
> of your project.  (Although I suppose you've probably already written
> something similar.)

Indeed.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From steve at holdenweb.com  Thu Nov 30 22:48:53 2006
From: steve at holdenweb.com (Steve Holden)
Date: Thu, 30 Nov 2006 21:48:53 +0000
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <1164917149.31269.344.camel@localhost>
References: <5.1.1.6.0.20061126143124.027f3e68@sparrow.telecommunity.com>	<5.1.1.6.0.20061127090652.04341cf8@sparrow.telecommunity.com>	<bcf87d920611280419l77fd663ema166733b2745cadf@mail.gmail.com>	<200611290053.17199.anthony@interlink.com.au>	<20061129112354.GA30665@code0.codespeak.net>
	<1164917149.31269.344.camel@localhost>
Message-ID: <456F51C5.8080500@holdenweb.com>

Jan Claeys wrote:
> Op woensdag 29-11-2006 om 12:23 uur [tijdzone +0100], schreef Armin
> Rigo:
>> I could not agree more.  Nowadays, whenever I get an account on a new
>> Linux machine, the first thing I have to do is reinstall Python
>> correctly in my home dir because the system Python lacks distutils.
>> Wasteful.  (There are some applications and libraries that use
>> distutils at run-time to compile things, and I'm using such
>> applications and libraries on a daily basis.) 
> 
> I think you should blame the sysadmins, and kick them to install python
> properly for use by a developer, because every distro I know provides
> distutils...   ;-)
> 
> 
I think the point is that some distros (Debian is the one that springs 
to mind most readily, but I'm not a distro archivist) require a separate 
install for distutils even though it's been a part of the standard 
*Python* distro since 2.3 (2.2?)

So, it isn't that you can't get distutils, it's that you have to take an 
extra step over and above installing Python.

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb       http://holdenweb.blogspot.com
Recent Ramblings     http://del.icio.us/steve.holden


From guido at python.org  Thu Nov 30 22:49:30 2006
From: guido at python.org (Guido van Rossum)
Date: Thu, 30 Nov 2006 13:49:30 -0800
Subject: [Python-Dev] Small tweak to tokenize.py?
In-Reply-To: <ekn881$jon$2@sea.gmane.org>
References: <5.1.1.6.0.20061130131522.033863a0@sparrow.telecommunity.com>
	<ca471dc20611301028w6d67ad03w73aaf381dbfbda1e@mail.gmail.com>
	<ekn881$jon$2@sea.gmane.org>
Message-ID: <ca471dc20611301349t6db42647x1a213e8900d61e69@mail.gmail.com>

On 11/30/06, Fredrik Lundh <fredrik at pythonware.com> wrote:
> Guido van Rossum wrote:
>
> > Are you opposed changing tokenize? If so, why (apart from
> > compatibility)? ISTM that it would be a good thing if it reported
> > everything except horizontal whitespace.
>
> it would be a good thing if it could, optionally, be made to report
> horizontal whitespace as well.

It's remarkably easy to get this out of the existing API; keep track
of the end position returned by the previous call, and if it's
different from the start position returned by the next call, slice the
line text from the column positions, assuming the line numbers are the
same. If the line numbers differ, something has been eating \n tokens;
this shouldn't happen any more with my patch.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)

From sluggoster at gmail.com  Thu Nov 30 23:46:22 2006
From: sluggoster at gmail.com (Mike Orr)
Date: Thu, 30 Nov 2006 14:46:22 -0800
Subject: [Python-Dev] Python and the Linux Standard Base (LSB)
In-Reply-To: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
References: <20061130032036.11053.1356768333.divmod.xquotient.888@joule.divmod.com>
Message-ID: <6e9196d20611301446y629c04cdn6c8215dfe065d006@mail.gmail.com>

On 11/29/06, glyph at divmod.com <glyph at divmod.com> wrote:
> The major advantage ~/.local has for *nix systems is the ability to have a
> parallel *bin* directory, which provides the user one location to set their
> $PATH to, so that installed scripts work as expected, rather than having to
> edit a bunch of .foorc files to add to your environment with each additional
> package.  After all, what's the point of a per-user "install" if the
> software isn't actually installed in any meaningful way, and you have to
> manually edit your shell startup scripts, log out and log in again anyway?
> Another nice feature there is that it uses a pre-existing layout convention
> (bin lib share etc ...) rather than attempting to build a new one, so the
> only thing that has to change about the package installation is the root.

Putting programs and libraries in a hidden directory?  Things the user
intends to run or inspect?  Putting a hidden directory on $PATH?
I'm... stunned.  It sounds like a very bad idea.  Dotfiles are for a
program's internal state: "black box" stuff.  Not programs the user
will run, and not Python modules he may want to inspect or subclass.
~/bin and ~/lib already work well with both Virtual Python and
./configure, and it's what many users are already doing.

On the other hand, the freedesktop link says ~/.local can be
overridden with environment variables.  That may be an acceptable
compromise between the two.

Speaking of Virtual Python [1], I've heard some people recommending it
as a general solution to the "this library breaks that other
application" problem and "this app needs a different version of X
library than that other app does".  I've started using it off and on
but haven't come to any general conclusion on it.  Is it becoming
pretty widespread among Python users.  Would it be worth mentioning in
the LSB/FHS?  It only works on *nix systems currently, but Linux is a
*nix system anyway.

[1] http://peak.telecommunity.com/dist/virtual-python.py
(It installs a pseudo copy of Python symlinked to the system one,
so that you have your own site--packages directory independent of others )

-- 
Mike Orr <sluggoster at gmail.com>