From noreply at sourceforge.net  Mon Jan  1 07:25:14 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 31 Dec 2006 22:25:14 -0800
Subject: [Patches] [ python-Patches-1620174 ] Improve platform.py usability
	on Windows
Message-ID: <E1H1Gbi-0002JH-Ey@sc8-sf-web2.sourceforge.net>

Patches item #1620174, was opened at 2006-12-21 22:49
Message generated for change (Comment added) made by infidel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Luke Dunstan (infidel)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Improve platform.py usability on Windows

Initial Comment:
This patch modifies platform.py to remove most of the dependencies on pywin32, and use the standard ctypes and _winreg modules instead. It also adds support for Windows CE.


----------------------------------------------------------------------

>Comment By: Luke Dunstan (infidel)
Date: 2007-01-01 14:25

Message:
Logged In: YES 
user_id=30442
Originator: YES

Why does platform.py need to be compatible with earlier versions of
Python?

The return types haven't changed, and I think the return values won't
change because the same OS APIs are being used.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-01 02:49

Message:
Logged In: YES 
user_id=38388
Originator: NO

I haven't looked at the patch yet, so just a few general comments on
changes to platform.py:

* the code must continue to work with Python versions prior to 2.6

  This means that ctypes and _winreg support may be added as an option,
but 
  removing pywin32 calls is not the right way to proceed.

* changes in return type of the public and documented APIs are not
possible

  If you have a need for more information, then a new API should be
added,
  or the information merged into one of the existing return fields.

* changes in the return values of APIs due to use of different OS APIs
must
  be avoided

  There's code out there relying on the return values, so if in doubt a
new
  API must be provided.


----------------------------------------------------------------------

Comment By: Luke Dunstan (infidel)
Date: 2006-12-31 13:57

Message:
Logged In: YES 
user_id=30442
Originator: YES

1. Yes this is intended for 2.6

2. The only difference between win32api.RegQueryValueEx and
_winreg.QueryValueEx seems to be that the latter returns Unicode strings. I
have adjusted the patch to be more compatible with the old behaviour.

3. I have updated the doc string in the new patch.

File Added: platform-wince-2.diff

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-31 08:13

Message:
Logged In: YES 
user_id=764593
Originator: NO


( win32api.RegQueryValueEx is _winreg.QueryValueEx ) ?

If not, it should wait for 2.6, and there should be an entry in what's
new.  (I suppose similar concerns exist for other return classes.)

The change to win32_ver only half-corrects the return type to the
four-tuple.  The meaning of release (even if it is just "release name")
should be specified in the text.


 def win32_ver(release='',version='',csd='',ptype=''):
 
     """ Get additional version information from the Windows Registry
-        and return a tuple (version,csd,ptype) referring to version
+        and return a tuple (release,version,csd,ptype) referring to
version
         number, CSD level and OS type (multi/single
         processor).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470

From noreply at sourceforge.net  Wed Jan  3 00:50:23 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 02 Jan 2007 15:50:23 -0800
Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception
	attributes
Message-ID: <E1H1tOh-0005tm-JR@sc8-sf-web1.sourceforge.net>

Patches item #1626538, was opened at 2007-01-02 18:50
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Nobody/Anonymous (nobody)
Summary: update to PEP 344 - exception attributes

Initial Comment:
PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception.

The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release.

This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

From noreply at sourceforge.net  Wed Jan  3 00:56:35 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 02 Jan 2007 15:56:35 -0800
Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception
	attributes
Message-ID: <E1H1tUh-0004Ne-H8@sc8-sf-web6.sourceforge.net>

Patches item #1626538, was opened at 2007-01-02 18:50
Message generated for change (Comment added) made by jimjjewett
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
>Assigned to: Ka-Ping Yee (ping)
Summary: update to PEP 344 - exception attributes

Initial Comment:
PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception.

The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release.

This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution.

----------------------------------------------------------------------

>Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-02 18:56

Message:
Logged In: YES 
user_id=764593
Originator: YES

http://mail.python.org/pipermail/python-3000/2007-January/005322.html

Guido said he could check it in if Ping agrees, so I'm assigning the patch
to ping (who I *hope* is Ka-Ping Yee)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

From noreply at sourceforge.net  Wed Jan  3 01:00:22 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 02 Jan 2007 16:00:22 -0800
Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception
	attributes
Message-ID: <E1H1tYM-0000hq-4j@sc8-sf-monitor2.sourceforge.net>

Patches item #1626538, was opened at 2007-01-02 18:50
Message generated for change (Comment added) made by jimjjewett
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Ka-Ping Yee (ping)
Summary: update to PEP 344 - exception attributes

Initial Comment:
PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception.

The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release.

This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution.

----------------------------------------------------------------------

>Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-02 19:00

Message:
Logged In: YES 
user_id=764593
Originator: YES

File Added: pep344diff.txt

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-02 18:56

Message:
Logged In: YES 
user_id=764593
Originator: YES

http://mail.python.org/pipermail/python-3000/2007-January/005322.html

Guido said he could check it in if Ping agrees, so I'm assigning the patch
to ping (who I *hope* is Ka-Ping Yee)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

From noreply at sourceforge.net  Wed Jan  3 01:30:33 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 02 Jan 2007 16:30:33 -0800
Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception
	attributes
Message-ID: <E1H1u1Z-0003mX-Kd@sc8-sf-web9.sourceforge.net>

Patches item #1626538, was opened at 2007-01-02 15:50
Message generated for change (Comment added) made by ping
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Ka-Ping Yee (ping)
Summary: update to PEP 344 - exception attributes

Initial Comment:
PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception.

The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release.

This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution.

----------------------------------------------------------------------

>Comment By: Ka-Ping Yee (ping)
Date: 2007-01-02 16:30

Message:
Logged In: YES 
user_id=45338
Originator: NO

Okay, it will take me a moment to page this back into my head and respond.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-02 16:00

Message:
Logged In: YES 
user_id=764593
Originator: YES

File Added: pep344diff.txt

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-02 15:56

Message:
Logged In: YES 
user_id=764593
Originator: YES

http://mail.python.org/pipermail/python-3000/2007-January/005322.html

Guido said he could check it in if Ping agrees, so I'm assigning the patch
to ping (who I *hope* is Ka-Ping Yee)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

From noreply at sourceforge.net  Wed Jan  3 16:21:36 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 07:21:36 -0800
Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at
	all
Message-ID: <E1H27vs-0001oW-QO@sc8-sf-web7.sourceforge.net>

Patches item #1627052, was opened at 2007-01-03 10:21
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Nobody/Anonymous (nobody)
Summary: backticks will not be used at all

Initial Comment:
In python 3, backticks will not mean repr.

Every few months, someone suggests a new meaning for them.

This clarifies that they won't be reused at all.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

From noreply at sourceforge.net  Wed Jan  3 16:22:51 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 07:22:51 -0800
Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at
	all
Message-ID: <E1H27x5-0006ls-RE@sc8-sf-web1.sourceforge.net>

Patches item #1627052, was opened at 2007-01-03 10:21
Message generated for change (Settings changed) made by jimjjewett
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
>Assigned to: Georg Brandl (gbrandl)
Summary: backticks will not be used at all

Initial Comment:
In python 3, backticks will not mean repr.

Every few months, someone suggests a new meaning for them.

This clarifies that they won't be reused at all.

----------------------------------------------------------------------

>Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-03 10:22

Message:
Logged In: YES 
user_id=764593
Originator: YES

Assigning to PEP owner, Georg.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 00:46:07 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 15:46:07 -0800
Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does
	not close sockets properly)
Message-ID: <E1H2Fo6-00053u-Vb@sc8-sf-web1.sourceforge.net>

Patches item #1627441, was opened at 2007-01-03 23:46
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for #1601399 (urllib2 does not close sockets properly)

Initial Comment:
Fix for #1601399

Definitely a backport candidate.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 03:53:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 18:53:30 -0800
Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception
	attributes
Message-ID: <E1H2IjS-0003eC-6Q@sc8-sf-web3.sourceforge.net>

Patches item #1626538, was opened at 2007-01-02 15:50
Message generated for change (Comment added) made by ping
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Ka-Ping Yee (ping)
Summary: update to PEP 344 - exception attributes

Initial Comment:
PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception.

The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release.

This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution.

----------------------------------------------------------------------

>Comment By: Ka-Ping Yee (ping)
Date: 2007-01-03 18:53

Message:
Logged In: YES 
user_id=45338
Originator: NO

I've checked in this change.  Thanks for writing the patch.

----------------------------------------------------------------------

Comment By: Ka-Ping Yee (ping)
Date: 2007-01-02 16:30

Message:
Logged In: YES 
user_id=45338
Originator: NO

Okay, it will take me a moment to page this back into my head and respond.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-02 16:00

Message:
Logged In: YES 
user_id=764593
Originator: YES

File Added: pep344diff.txt

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-02 15:56

Message:
Logged In: YES 
user_id=764593
Originator: YES

http://mail.python.org/pipermail/python-3000/2007-January/005322.html

Guido said he could check it in if Ping agrees, so I'm assigning the patch
to ping (who I *hope* is Ka-Ping Yee)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 04:59:16 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 19:59:16 -0800
Subject: [Patches] [ python-Patches-1624059 ] fast subclasses of builtin
	types
Message-ID: <E1H2Jl6-0000RK-1Q@sc8-sf-web5.sourceforge.net>

Patches item #1624059, was opened at 2006-12-29 01:01
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Guido van Rossum (gvanrossum)
Summary: fast subclasses of builtin types

Initial Comment:
This is similar to a patch posted on python-dev a few months ago (or more).  I modified it to also handle subclassing exceptions which should speed up exception handling a bit.  (This was proposed by Guido based on the original patch.)  I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-03 22:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

This looks fine, but I have some questions about alternative
implementations:

- Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact()
before calling the fast bit checking macro?  Did you measure that this is
in fact faster?  True, it means always a pointer deref, so maybe it is --
but OTOH it is more instructions.

- Why not have a separate bit for each type?  Then you could make the fast
macro test for (flags & mask) != 0 instead of testing for (flag & mask) ==
value.  It would use up all the remaining bits, but I suspect there are
some unused (or reusable) bits in lower positions: 1L<<2 is unused (was
GC), and 1L<<11 also seems unused.  And bits 18 through 23!  And I'm
guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they
were introduced in 2.0...  So it really looks like you have plenty of bits.
 Of course I don't know if it matters; would be worth it perhaps to look at
the machine code.

- Oops, it looks like your comment is off.  You claim to be using bits
24-27, leaving 28-31 free, but in fact you're using bits 28-31!

BTW You're inroducing quite a few lines over 80 chars.  Perhaps cut back a
bit?


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-29 01:04

Message:
Logged In: YES 
user_id=33168
Originator: YES

I forgot to mention this patch works by using unused bits in tp_flags. 
This saves a function call when checking for a subclass of a builtin type.

There's one funky thing about this patch, the change to
Objects/exceptions.c.  I didn't investigate why this was necessary, or more
likely I did why when I added it and forgot.  I know that without adding
BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 05:30:54 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 20:30:54 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H2KFi-0007Hx-Kt@sc8-sf-web1.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 15:53
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-03 23:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 01:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-27 22:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-27 20:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-27 20:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-27 19:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 14:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 10:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 16:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 15:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 15:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 17:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 13:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 04:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 03:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-19 20:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 12:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-03 20:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 05:57:08 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 20:57:08 -0800
Subject: [Patches] [ python-Patches-1548388 ] set comprehensions
Message-ID: <E1H2Kf6-0004Ut-Ni@sc8-sf-web6.sourceforge.net>

Patches item #1548388, was opened at 2006-08-29 04:33
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1548388&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Georg Brandl (gbrandl)
Assigned to: Georg Brandl (gbrandl)
Summary: set comprehensions

Initial Comment:
This is a big one:

* cleanup grammar; unifies listcomp/genexp grammar
which means that [x for x in 1, 2] is no longer valid

* cleanup comprehension compiling code (unifies all AST
code for the three comprehensions and most of the
compile.c code)

* add set comprehensions

This patch modifies list comprehensions to be
implemented more like generator expressions: in a
separate function, which means that the loop variables
will not leak any more.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-03 23:57

Message:
Logged In: YES 
user_id=6380
Originator: NO

There was some discussion on the py3k list about Raymond's suggestion. 
Are you thinking of doing that?  I'd really like to see the syntactic
changes and additions from this patch, but I agree that for list/set comps
we can do without the extra stack frame.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-09-08 13:13

Message:
Logged In: YES 
user_id=80475

Genexps necessarily need a separate stack frame to achieve
saved execution state (including the instruction pointer and
local variable).  Also, it was simplest to implement genexps
in terms of the existing and proven code for regular generators.

For list and set comps, I think you can take a simpler
approach and just rename the inner loop variable to
something invisible.  That will make it faster, make the
disassemby readable, and make it easier to follow in pdb. 
Also, we get to capitalize on proven code -- they only
difference is that the induction variable won't be visible
to surrounding code.

Since what you have works, I would say just check it in;
however, it would probably never get touched again and an
early, arbitrary design choice would get set in stone.  My
bet is that the renaming approach will result in a much
simpler patch.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-09-06 11:36

Message:
Logged In: YES 
user_id=6380

I always assumed that the genexps *require* being a function
because that's the only way to create a generator.  But that
argument doesn't apply to listcomps.

That's about all I know of the implementation of these.. :-(

Have you asked python-dev?

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-09-06 03:03

Message:
Logged In: YES 
user_id=849994

It is complete, it works and it does not leak the loop
variable(s).

The question is whether it is okay for listcomps and
setcomps to be in their own anonymous function, which slows
listcomps down compared to the 2.x branch.

I don't know why the function approach was taken for
genexps, but I suspect it was because who implemented it
then saw this as the best way to hide the loop variable.

Perhaps somebody else more familiar with the internals and
the previous discussions can look over it.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-09-06 02:48

Message:
Logged In: YES 
user_id=6380

Do you think this is ready to be checked in, or are you
still working on it?

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-09-01 05:38

Message:
Logged In: YES 
user_id=849994

Since you can put anything usable as an assignment target
after the "for" of a listcomp, just renaming might be
complicated.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-31 19:40

Message:
Logged In: YES 
user_id=6380

+1.

Would this cause problems for abominations like this though?

>>> a=[1]
>>> list(tuple(a) for a[0] in "abc")
[('a',), ('b',), ('c',)]
>>> a
['c']
>>>


----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-08-31 19:15

Message:
Logged In: YES 
user_id=80475

Would it be an oversimplfication for list and set comps to
keep everything in one code block and just hide the list
loop variables by renaming them:   x -->  __[x]

That approach would only require a minimal patch, and it
would make for a cleaner disassembly.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-08-31 15:55

Message:
Logged In: YES 
user_id=849994

Attaching slightly revised patch and bytecode comparison.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-08-29 18:30

Message:
Logged In: YES 
user_id=80475

Can you post a before and disassembly of some list and set
comprehensions.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-08-29 15:09

Message:
Logged In: YES 
user_id=849994

test_compiler and test_transformer fail because the compiler
package hasn't been updated yet.

test_dis fails because list comprehensions now generate
different bytecode.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-29 13:59

Message:
Logged In: YES 
user_id=6380

Nice!

I see failures in 4 tests:

    test_compiler test_dis test_transformer test_univnewlines

test_univnewlines is trivial (it's deleting a variable
leaked out of a list comprehension); haven't looked at the
rest in detail

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-08-29 04:34

Message:
Logged In: YES 
user_id=849994

The previously attached patch contains only the important
files. The FULL patch (attached now) also contains syntax
fixes in python files so that the test suite is mostly passing.

Note that the compiler package isn't ready yet.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1548388&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 06:17:01 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 21:17:01 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H2KyL-0007tc-RR@sc8-sf-web2.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 20:53
Message generated for change (Comment added) made by tonylownds
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 05:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 06:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 03:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 01:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 01:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 00:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 19:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 15:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 21:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 20:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 20:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 22:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 18:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 09:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 08:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-20 01:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-04 01:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 06:22:47 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 21:22:47 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H2L3v-0001EM-B6@sc8-sf-web3.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 15:53
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 00:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Well, it depends on the context whether that matters.  The kw-only args
could just be included in the positional args (which have names anyway) and
that wouldn't be so bad for some apps.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 00:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-03 23:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 01:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-27 22:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-27 20:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-27 20:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-27 19:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 14:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 10:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 16:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 15:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 15:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 17:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 13:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 04:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 03:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-19 20:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 12:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-03 20:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 07:55:56 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 22:55:56 -0800
Subject: [Patches] [ python-Patches-1494140 ] Documentation for new Struct
	object
Message-ID: <E1H2MW4-0007SS-KV@sc8-sf-web6.sourceforge.net>

Patches item #1494140, was opened at 2006-05-24 02:26
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1494140&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 2.5
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: Bob Ippolito (etrepum)
Assigned to: Nobody/Anonymous (nobody)
Summary: Documentation for new Struct object

Initial Comment:
The performance enhancements to the struct module (patch #1493701) 
are implemented by having a Struct object, which is a compiled structure. 
This text file documents these new struct objects.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-03 22:55

Message:
Logged In: YES 
user_id=33168
Originator: NO

Even if this only documents part of the API, it seems like it would be
better to get that in and finish it off later.  Anyone know what's going on
with this?

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-10-29 01:28

Message:
Logged In: YES 
user_id=849994

What's the status of this? It should have been in 2.5 final...

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-08-02 00:38

Message:
Logged In: YES 
user_id=849994

New/renamed functions need a \versionadded/changed. For
StructObjects, I'd suggest a sentence like "Struct objects
are new in version 2.5" at the top of the section.

There's no explanation how to create a Struct object. The
constructor must be explained, preferably on the module
overview page. Isn't the type name "Struct"?

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2006-07-30 10:33

Message:
Logged In: YES 
user_id=671362

> Does this patch still need to be updated for pack_to()
I suppose so and hence updated my patch.

(1) document pack_into(pack_to is renamed to pack_into).
(2) document pack_into/pack_from as module functions
too(just like re module)

As for the function name change, I've already updated
"what's new in 2.5" in r50985.

I guess the patch is ready to be applied. Reviews are
appreciated.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2006-07-29 12:28

Message:
Logged In: YES 
user_id=11375

Does this patch still need to be updated for pack_to(), or can it just be

applied?

----------------------------------------------------------------------

Comment By: George Yoshida (quiver)
Date: 2006-07-10 10:26

Message:
Logged In: YES 
user_id=671362

Patch for the TeX style doc.

Bob, can you work on updating the main section right after
2.5 b2?


----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2006-05-26 06:05

Message:
Logged In: YES 
user_id=139309

We're going to need to revise this patch some more to document the new 
pack_to function (for Martin Blais' hotbuf work)

Additionally we'll probably also want to revise the main struct
documentation to 
talk about bounds checking and avoiding the creation of long objects.


----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2006-05-25 07:32

Message:
Logged In: YES 
user_id=139309

That's clearly a typo. I've attached a new version of the patch that
removes those 
two letters.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-05-24 14:03

Message:
Logged In: YES 
user_id=764593

Shouldn't self.size be the number of bytes required to *pack
* the structure?  The number required to *unpack* seems 
like it ought to include tuple overhead and such...


----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2006-05-24 08:35

Message:
Logged In: YES 
user_id=139309

New patch attached, fixed unpack documentation, added unpack_from method.

----------------------------------------------------------------------

Comment By: Bob Ippolito (etrepum)
Date: 2006-05-24 07:54

Message:
Logged In: YES 
user_id=139309

Hold up on this patch, I need to revise it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1494140&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 08:12:16 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 03 Jan 2007 23:12:16 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H2Mls-00059L-1V@sc8-sf-web4.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 20:53
Message generated for change (Comment added) made by tonylownds
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 07:12

Message:
Logged In: YES 
user_id=24100
Originator: YES

For getargs and getargvalues, including the names in positional args is an
excellent strategy.
There are uses (in cgitb) in the stdlib for getargvalues that then
wouldn't need to be changed. 

The 2 uses of getargspec in the stdlib (one of which I missed, in
DocXMLRPCServer) are both 
closely followed by formatargspec. I think those APIs should change or
information will be lost. 

Alternatively, a new function (hopefully with a better name than
getfullargspec :) could be 
made and getargspec could retain its API, but raise an error when
keyword-only arguments are 
present.

def getargspec(func):
  args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann =
getfullargspec()
  if kwonlyargs:
     raise ValueError, "function has keyword-only arguments, use
getfullargspec!"
  return args, varargs, varkw, defaults

I'll update the patch to fix getargvalues and DocXMLRPCServer this
weekend.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 05:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Well, it depends on the context whether that matters.  The kw-only args
could just be included in the positional args (which have names anyway) and
that wouldn't be so bad for some apps.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 05:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 06:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 03:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 01:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 01:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 00:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 19:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 15:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 21:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 20:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 20:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 22:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 18:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 09:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 08:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-20 01:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-04 01:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 18:53:56 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 04 Jan 2007 09:53:56 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H2Wmq-0001fF-0K@sc8-sf-web9.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 15:53
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 12:53

Message:
Logged In: YES 
user_id=6380
Originator: NO

I like the following approach: (1) the old API continues to work for all
functions, but provides incomplete information (not losing the kw-only args
completely, but losing the fact that they are kw-only); (2) add a new API
that provides all the relevant information.

Maybe the new API should not return a 7-tuple but rather a structure with
named attributes; that makes it more future-proof.

Sorry, I don't have any good suggestions for new names.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 02:12

Message:
Logged In: YES 
user_id=24100
Originator: YES

For getargs and getargvalues, including the names in positional args is an
excellent strategy.
There are uses (in cgitb) in the stdlib for getargvalues that then
wouldn't need to be changed. 

The 2 uses of getargspec in the stdlib (one of which I missed, in
DocXMLRPCServer) are both 
closely followed by formatargspec. I think those APIs should change or
information will be lost. 

Alternatively, a new function (hopefully with a better name than
getfullargspec :) could be 
made and getargspec could retain its API, but raise an error when
keyword-only arguments are 
present.

def getargspec(func):
  args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann =
getfullargspec()
  if kwonlyargs:
     raise ValueError, "function has keyword-only arguments, use
getfullargspec!"
  return args, varargs, varkw, defaults

I'll update the patch to fix getargvalues and DocXMLRPCServer this
weekend.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 00:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Well, it depends on the context whether that matters.  The kw-only args
could just be included in the positional args (which have names anyway) and
that wouldn't be so bad for some apps.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 00:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-03 23:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 01:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-27 22:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-27 20:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-27 20:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-27 19:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 14:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 10:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 16:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 15:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 15:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 17:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 13:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 04:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 03:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-19 20:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 12:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-03 20:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 18:56:38 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 04 Jan 2007 09:56:38 -0800
Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have
	TortoiseSVN but no .svn/*
Message-ID: <E1H2WpS-0001xn-Vk@sc8-sf-web9.sourceforge.net>

Patches item #1628061, was opened at 2007-01-04 17:56
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win32: Fix build when you have TortoiseSVN but no .svn/*

Initial Comment:

Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds.  This is done by compiling a short C file, make_buildinfo.c, and running the result.  make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c.

The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified.  However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created.  If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c.  When it fails in this way I see this inscrutable message in the log:
"C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c
SubWCRev : Path '..' ends in '..', which is unsupported for this operation

This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step.  If getbuildinfo2.c exists, it returns true, else it returns false.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 18:59:49 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 04 Jan 2007 09:59:49 -0800
Subject: [Patches] [ python-Patches-1628062 ] Win32: Add bytesobject.c to
	pythoncore.vcproj
Message-ID: <E1H2WsX-0004Yq-E1@sc8-sf-web7.sourceforge.net>

Patches item #1628062, was opened at 2007-01-04 17:59
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win32: Add bytesobject.c to pythoncore.vcproj

Initial Comment:
Objects/bytesobject.c is a new C source in the distribution, and pythoncore won't build properly without it.  This patch adds it for VC7.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470

From noreply at sourceforge.net  Thu Jan  4 22:37:50 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 04 Jan 2007 13:37:50 -0800
Subject: [Patches] [ python-Patches-1628205 ] socket.readline() interface
	doesn't handle EINTR properly
Message-ID: <E1H2aHW-00030c-Mo@sc8-sf-web3.sourceforge.net>

Patches item #1628205, was opened at 2007-01-04 13:37
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Maxim Sobolev (sobomax)
Assigned to: Nobody/Anonymous (nobody)
Summary: socket.readline() interface doesn't handle EINTR properly

Initial Comment:
The socket.readline() interface doesn't handle EINTR properly. Currently, when EINTR received exception is not handled and all data that has been in the buffer is lost. There is no way to recover that data from the code that uses the interface.

Correct behaviour would be to catch EINTR and restart recv(). Patch is attached.

Following is the real world example of how it affects httplib module:

  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1096, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1383, in __request
    verbose=self.__verbose
  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1131, in request
    errcode, errmsg, headers = h.getreply()
  File "/usr/local/lib/python2.4/httplib.py", line 1137, in getreply
    response = self._conn.getresponse()
  File "/usr/local/lib/python2.4/httplib.py", line 866, in getresponse
    response.begin()
  File "/usr/local/lib/python2.4/httplib.py", line 336, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python2.4/httplib.py", line 294, in _read_status
    line = self.fp.readline()
  File "/usr/local/lib/python2.4/socket.py", line 325, in readline
    data = recv(1)
error: (4, 'Interrupted system call')

-Maxim

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470

From noreply at sourceforge.net  Fri Jan  5 02:10:07 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 04 Jan 2007 17:10:07 -0800
Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have
	TortoiseSVN but no .svn/*
Message-ID: <E1H2dax-0006a6-Sa@sc8-sf-web11.sourceforge.net>

Patches item #1628061, was opened at 2007-01-04 18:56
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win32: Fix build when you have TortoiseSVN but no .svn/*

Initial Comment:

Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds.  This is done by compiling a short C file, make_buildinfo.c, and running the result.  make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c.

The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified.  However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created.  If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c.  When it fails in this way I see this inscrutable message in the log:
"C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c
SubWCRev : Path '..' ends in '..', which is unsupported for this operation

This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step.  If getbuildinfo2.c exists, it returns true, else it returns false.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-05 02:10

Message:
Logged In: YES 
user_id=21627
Originator: NO

This patch shouldn't be necessary. make_buildinfo2 checks whether there is
a .svn subdirectory, and if there is none, it compiles getbuildinfo.c (just
like when subwcrev.exe wasn't found).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

From noreply at sourceforge.net  Fri Jan  5 02:50:51 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 04 Jan 2007 17:50:51 -0800
Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have
	TortoiseSVN but no .svn/*
Message-ID: <E1H2eEN-0000V5-Ki@sc8-sf-web2.sourceforge.net>

Patches item #1628061, was opened at 2007-01-04 17:56
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win32: Fix build when you have TortoiseSVN but no .svn/*

Initial Comment:

Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds.  This is done by compiling a short C file, make_buildinfo.c, and running the result.  make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c.

The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified.  However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created.  If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c.  When it fails in this way I see this inscrutable message in the log:
"C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c
SubWCRev : Path '..' ends in '..', which is unsupported for this operation

This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step.  If getbuildinfo2.c exists, it returns true, else it returns false.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-05 01:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

Good point.  I seem to have goofed up my directory in a very specific way:
when I made a copy of the tree, I explicitly did *not* copy the top-level
.svn, but I forgot to do anything about the .svn directories in the
subdirectories.  make_buildinfo is run from the "PCbuild" directory, which
still has a ".svn" directory, so the _stat(".svn") call succeeds.  But the
call to SubWCRev.exe fails because ".." (aka the Python root) doesn't have
a ".svn" directory.

I assert that the patch won't hurt anything, and will make the build
process slightly more tolerant of goof-ups like me.  If you prefer, I could
submit an alternate patch where the current directory is the Python root
and it writes to "PCbuild/getbuildinfo2.c".  Or one where the stat checks
for "../.svn" instead.  Or if you don't want any patch at all, that works
too, just close the patch.

In the meantime, I'll clean up my build tree.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-05 01:10

Message:
Logged In: YES 
user_id=21627
Originator: NO

This patch shouldn't be necessary. make_buildinfo2 checks whether there is
a .svn subdirectory, and if there is none, it compiles getbuildinfo.c (just
like when subwcrev.exe wasn't found).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

From noreply at sourceforge.net  Fri Jan  5 16:14:23 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 05 Jan 2007 07:14:23 -0800
Subject: [Patches] [ python-Patches-1520904 ] Fix tests that assume they can
	write to Lib/test
Message-ID: <E1H2qlz-00046T-Gd@sc8-sf-web11.sourceforge.net>

Patches item #1520904, was opened at 2006-07-11 20:53
Message generated for change (Comment added) made by akuchling
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tests
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Douglas Greiman (dgreiman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix tests that assume they can write to Lib/test

Initial Comment:
A number of bsddb tests, as well as test_tarfile,
create temporary files in Lib/ or
{prefix}/lib/pythonX.Y/ .  This change uses
tempfile.gettempdir() instead.

Tested on RedHat 9.0 Linux on x86.


----------------------------------------------------------------------

>Comment By: A.M. Kuchling (akuchling)
Date: 2007-01-05 10:14

Message:
Logged In: YES 
user_id=11375
Originator: NO

Can you clarify in what cases test_tarfile writes to the current
directory?


----------------------------------------------------------------------

Comment By: Matt Fleming (splitscreen)
Date: 2006-08-31 07:07

Message:
Logged In: YES 
user_id=1126061

This looks fine to me, and a worthwhile change.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470

From noreply at sourceforge.net  Fri Jan  5 16:52:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 05 Jan 2007 07:52:11 -0800
Subject: [Patches] [ python-Patches-1520904 ] Fix tests that assume they can
	write to Lib/test
Message-ID: <E1H2rMZ-0004a1-2q@sc8-sf-web6.sourceforge.net>

Patches item #1520904, was opened at 2006-07-11 20:53
Message generated for change (Comment added) made by akuchling
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tests
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Douglas Greiman (dgreiman)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix tests that assume they can write to Lib/test

Initial Comment:
A number of bsddb tests, as well as test_tarfile,
create temporary files in Lib/ or
{prefix}/lib/pythonX.Y/ .  This change uses
tempfile.gettempdir() instead.

Tested on RedHat 9.0 Linux on x86.


----------------------------------------------------------------------

>Comment By: A.M. Kuchling (akuchling)
Date: 2007-01-05 10:52

Message:
Logged In: YES 
user_id=11375
Originator: NO

Committed the Lib/bsddb changes to the trunk in rev. 53264; thanks! 

That leaves only the tarfile change to commit, but I'd like to understand
why it's necessary first.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2007-01-05 10:14

Message:
Logged In: YES 
user_id=11375
Originator: NO

Can you clarify in what cases test_tarfile writes to the current
directory?


----------------------------------------------------------------------

Comment By: Matt Fleming (splitscreen)
Date: 2006-08-31 07:07

Message:
Logged In: YES 
user_id=1126061

This looks fine to me, and a worthwhile change.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 08:36:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 05 Jan 2007 23:36:11 -0800
Subject: [Patches] [ python-Patches-1628062 ] Win32: Add bytesobject.c to
	pythoncore.vcproj
Message-ID: <E1H3667-0001yQ-1z@sc8-sf-web4.sourceforge.net>

Patches item #1628062, was opened at 2007-01-04 17:59
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 3000
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win32: Add bytesobject.c to pythoncore.vcproj

Initial Comment:
Objects/bytesobject.c is a new C source in the distribution, and pythoncore won't build properly without it.  This patch adds it for VC7.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-06 07:36

Message:
Logged In: YES 
user_id=364875
Originator: YES

Bumping the priority so it gets noticed.  Fixing the build is
mom-and-apple pie stuff, and getting this patch into the official tree will
make my life a little more pleasant.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 10:37:36 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 01:37:36 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H37zc-0001YR-Ow@sc8-sf-web11.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 14:49:34 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 05:49:34 -0800
Subject: [Patches] [ python-Patches-1628062 ] Win32: Add bytesobject.c to
	pythoncore.vcproj
Message-ID: <E1H3BvS-0005Xv-SX@sc8-sf-web7.sourceforge.net>

Patches item #1628062, was opened at 2007-01-04 18:59
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 3000
>Status: Closed
>Resolution: Accepted
Priority: 7
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win32: Add bytesobject.c to pythoncore.vcproj

Initial Comment:
Objects/bytesobject.c is a new C source in the distribution, and pythoncore won't build properly without it.  This patch adds it for VC7.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-06 14:49

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thanks for the patch. Committed as r53289.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-06 08:36

Message:
Logged In: YES 
user_id=364875
Originator: YES

Bumping the priority so it gets noticed.  Fixing the build is
mom-and-apple pie stuff, and getting this patch into the official tree
will make my life a little more pleasant.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 14:50:50 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 05:50:50 -0800
Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have
	TortoiseSVN but no .svn/*
Message-ID: <E1H3Bwg-0006Jn-1w@sc8-sf-web8.sourceforge.net>

Patches item #1628061, was opened at 2007-01-04 18:56
Message generated for change (Settings changed) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 3000
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Win32: Fix build when you have TortoiseSVN but no .svn/*

Initial Comment:

Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds.  This is done by compiling a short C file, make_buildinfo.c, and running the result.  make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c.

The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified.  However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created.  If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c.  When it fails in this way I see this inscrutable message in the log:
"C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c
SubWCRev : Path '..' ends in '..', which is unsupported for this operation

This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step.  If getbuildinfo2.c exists, it returns true, else it returns false.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-06 14:50

Message:
Logged In: YES 
user_id=21627
Originator: NO

Ok, rejecting the patch then. The typical case would be that you have an
exported tree, in which case there wouldn't be any .svn directories at
all. As I'm sure you know, you can easily fix your installation by
removing the .svn directory from PCBuild also.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-05 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

Good point.  I seem to have goofed up my directory in a very specific way:
when I made a copy of the tree, I explicitly did *not* copy the top-level
.svn, but I forgot to do anything about the .svn directories in the
subdirectories.  make_buildinfo is run from the "PCbuild" directory, which
still has a ".svn" directory, so the _stat(".svn") call succeeds.  But the
call to SubWCRev.exe fails because ".." (aka the Python root) doesn't have
a ".svn" directory.

I assert that the patch won't hurt anything, and will make the build
process slightly more tolerant of goof-ups like me.  If you prefer, I
could submit an alternate patch where the current directory is the Python
root and it writes to "PCbuild/getbuildinfo2.c".  Or one where the stat
checks for "../.svn" instead.  Or if you don't want any patch at all, that
works too, just close the patch.

In the meantime, I'll clean up my build tree.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-05 02:10

Message:
Logged In: YES 
user_id=21627
Originator: NO

This patch shouldn't be necessary. make_buildinfo2 checks whether there is
a .svn subdirectory, and if there is none, it compiles getbuildinfo.c (just
like when subwcrev.exe wasn't found).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 15:24:06 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 06:24:06 -0800
Subject: [Patches] [ python-Patches-1624059 ] fast subclasses of builtin
	types
Message-ID: <E1H3CSs-0003HE-C2@sc8-sf-web3.sourceforge.net>

Patches item #1624059, was opened at 2006-12-29 07:01
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Guido van Rossum (gvanrossum)
Summary: fast subclasses of builtin types

Initial Comment:
This is similar to a patch posted on python-dev a few months ago (or more).  I modified it to also handle subclassing exceptions which should speed up exception handling a bit.  (This was proposed by Guido based on the original patch.)  I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-06 15:24

Message:
Logged In: YES 
user_id=21627
Originator: NO

I made a couple of assembler experiments (see attached a.c), with gcc 4.1
on x86.

A "bit mask enumeration" test (f) compiles into four instructions:

        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L18
(fall-through being the else case)

A single bit test of a flag (g) compiles to two instructions:

        testl   $-1073741824, 8(%eax)
        je      .L9
(fall-through being the if case)

Adding an identity test (comparison with the address of a global),
followed by a bit mask test (h), compiles into six instructions:

        cmpl    $int_type, %eax
        je      .L2
        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L2
(fall-through being the else case)

In the common case, only two of these instructions are executed.

So all-in-all, I would agree with Guido that adding bit flags is more
efficient. However, existing bits cannot be recycled: in existing
binary extension modules, these flags are set, so if the modules don't
get recompiled, the type check would believe that the types are 
subtypes.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

This looks fine, but I have some questions about alternative
implementations:

- Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact()
before calling the fast bit checking macro?  Did you measure that this is
in fact faster?  True, it means always a pointer deref, so maybe it is --
but OTOH it is more instructions.

- Why not have a separate bit for each type?  Then you could make the fast
macro test for (flags & mask) != 0 instead of testing for (flag & mask) ==
value.  It would use up all the remaining bits, but I suspect there are
some unused (or reusable) bits in lower positions: 1L<<2 is unused (was
GC), and 1L<<11 also seems unused.  And bits 18 through 23!  And I'm
guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they
were introduced in 2.0...  So it really looks like you have plenty of bits.
 Of course I don't know if it matters; would be worth it perhaps to look at
the machine code.

- Oops, it looks like your comment is off.  You claim to be using bits
24-27, leaving 28-31 free, but in fact you're using bits 28-31!

BTW You're inroducing quite a few lines over 80 chars.  Perhaps cut back a
bit?


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-29 07:04

Message:
Logged In: YES 
user_id=33168
Originator: YES

I forgot to mention this patch works by using unused bits in tp_flags. 
This saves a function call when checking for a subclass of a builtin type.

There's one funky thing about this patch, the change to
Objects/exceptions.c.  I didn't investigate why this was necessary, or more
likely I did why when I added it and forgot.  I know that without adding
BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 15:54:28 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 06:54:28 -0800
Subject: [Patches] [ python-Patches-1624059 ] fast subclasses of builtin
	types
Message-ID: <E1H3CwG-00049T-O6@sc8-sf-web4.sourceforge.net>

Patches item #1624059, was opened at 2006-12-29 07:01
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Guido van Rossum (gvanrossum)
Summary: fast subclasses of builtin types

Initial Comment:
This is similar to a patch posted on python-dev a few months ago (or more).  I modified it to also handle subclassing exceptions which should speed up exception handling a bit.  (This was proposed by Guido based on the original patch.)  I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-06 15:54

Message:
Logged In: YES 
user_id=21627
Originator: NO

File Added: a.c

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-06 15:24

Message:
Logged In: YES 
user_id=21627
Originator: NO

I made a couple of assembler experiments (see attached a.c), with gcc 4.1
on x86.

A "bit mask enumeration" test (f) compiles into four instructions:

        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L18
(fall-through being the else case)

A single bit test of a flag (g) compiles to two instructions:

        testl   $-1073741824, 8(%eax)
        je      .L9
(fall-through being the if case)

Adding an identity test (comparison with the address of a global),
followed by a bit mask test (h), compiles into six instructions:

        cmpl    $int_type, %eax
        je      .L2
        movl    8(%eax), %eax
        andl    $-268435456, %eax
        cmpl    $1879048192, %eax
        je      .L2
(fall-through being the else case)

In the common case, only two of these instructions are executed.

So all-in-all, I would agree with Guido that adding bit flags is more
efficient. However, existing bits cannot be recycled: in existing
binary extension modules, these flags are set, so if the modules don't
get recompiled, the type check would believe that the types are 
subtypes.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

This looks fine, but I have some questions about alternative
implementations:

- Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact()
before calling the fast bit checking macro?  Did you measure that this is
in fact faster?  True, it means always a pointer deref, so maybe it is --
but OTOH it is more instructions.

- Why not have a separate bit for each type?  Then you could make the fast
macro test for (flags & mask) != 0 instead of testing for (flag & mask) ==
value.  It would use up all the remaining bits, but I suspect there are
some unused (or reusable) bits in lower positions: 1L<<2 is unused (was
GC), and 1L<<11 also seems unused.  And bits 18 through 23!  And I'm
guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they
were introduced in 2.0...  So it really looks like you have plenty of bits.
 Of course I don't know if it matters; would be worth it perhaps to look at
the machine code.

- Oops, it looks like your comment is off.  You claim to be using bits
24-27, leaving 28-31 free, but in fact you're using bits 28-31!

BTW You're inroducing quite a few lines over 80 chars.  Perhaps cut back a
bit?


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-29 07:04

Message:
Logged In: YES 
user_id=33168
Originator: YES

I forgot to mention this patch works by using unused bits in tp_flags. 
This saves a function call when checking for a subclass of a builtin type.

There's one funky thing about this patch, the change to
Objects/exceptions.c.  I didn't investigate why this was necessary, or more
likely I did why when I added it and forgot.  I know that without adding
BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 21:05:22 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 12:05:22 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H3Hn8-0008Iw-D6@sc8-sf-web5.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 20:53
Message generated for change (Comment added) made by tonylownds
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Tony Lownds (tonylownds)
Date: 2007-01-06 20:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Change peepholer to not bail in the presence of EXTENDED_ARG +
MAKE_FUNCTION.
Enforce the natural 16-bit limit of annotations in compile.c.

File Added: peepholer_and_max_annotations.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 17:53

Message:
Logged In: YES 
user_id=6380
Originator: NO

I like the following approach: (1) the old API continues to work for all
functions, but provides incomplete information (not losing the kw-only args
completely, but losing the fact that they are kw-only); (2) add a new API
that provides all the relevant information.

Maybe the new API should not return a 7-tuple but rather a structure with
named attributes; that makes it more future-proof.

Sorry, I don't have any good suggestions for new names.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 07:12

Message:
Logged In: YES 
user_id=24100
Originator: YES

For getargs and getargvalues, including the names in positional args is an
excellent strategy.
There are uses (in cgitb) in the stdlib for getargvalues that then
wouldn't need to be changed. 

The 2 uses of getargspec in the stdlib (one of which I missed, in
DocXMLRPCServer) are both 
closely followed by formatargspec. I think those APIs should change or
information will be lost. 

Alternatively, a new function (hopefully with a better name than
getfullargspec :) could be 
made and getargspec could retain its API, but raise an error when
keyword-only arguments are 
present.

def getargspec(func):
  args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann =
getfullargspec()
  if kwonlyargs:
     raise ValueError, "function has keyword-only arguments, use
getfullargspec!"
  return args, varargs, varkw, defaults

I'll update the patch to fix getargvalues and DocXMLRPCServer this
weekend.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 05:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Well, it depends on the context whether that matters.  The kw-only args
could just be included in the positional args (which have names anyway) and
that wouldn't be so bad for some apps.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 05:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 06:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 03:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 01:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 01:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 00:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 19:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 15:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 21:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 20:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 20:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 22:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 18:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 09:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 08:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-20 01:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-04 01:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Sat Jan  6 22:03:32 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 13:03:32 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H3IhQ-0007Cl-4h@sc8-sf-web1.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 20:53
Message generated for change (Comment added) made by tonylownds
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Tony Lownds (tonylownds)
Date: 2007-01-06 21:03

Message:
Logged In: YES 
user_id=24100
Originator: YES

I tried to implement getargspec() as described, and unfortunately there 
is another wrinkle to consider. Keyword-only arguments may or may not 
have defaults. So the invariant described in getargspec()'s docstring
can't 
be maintained when simply appending keyword-only arguments.

    A tuple of four things is returned: (args, varargs, varkw, defaults).
    'args' is a list of the argument names (it may contain nested lists).
    'args' will include keyword-only argument names. 
    'varargs' and 'varkw' are the names of the * and ** arguments or
None.
    'defaults' is an n-tuple of the default values of the last n
arguments.

The attached patch adds an 'getfullargspec' API that returns complete 
information; 'getargspec' raises an error if information would be lost;
the order 
of arguments in 'formatargspec' is backwards compatible, so that
formatargspec(*getargspec(f)) == formatargspec(*getfullargspec(f)) when
getargspec(f) does not raise an error.

PEP 362 could and probably should replace the new getfullargspec()
function,
so I did not implement an API more complicated than a tuple.

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-06 20:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Change peepholer to not bail in the presence of EXTENDED_ARG +
MAKE_FUNCTION.
Enforce the natural 16-bit limit of annotations in compile.c.

File Added: peepholer_and_max_annotations.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 17:53

Message:
Logged In: YES 
user_id=6380
Originator: NO

I like the following approach: (1) the old API continues to work for all
functions, but provides incomplete information (not losing the kw-only args
completely, but losing the fact that they are kw-only); (2) add a new API
that provides all the relevant information.

Maybe the new API should not return a 7-tuple but rather a structure with
named attributes; that makes it more future-proof.

Sorry, I don't have any good suggestions for new names.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 07:12

Message:
Logged In: YES 
user_id=24100
Originator: YES

For getargs and getargvalues, including the names in positional args is an
excellent strategy.
There are uses (in cgitb) in the stdlib for getargvalues that then
wouldn't need to be changed. 

The 2 uses of getargspec in the stdlib (one of which I missed, in
DocXMLRPCServer) are both 
closely followed by formatargspec. I think those APIs should change or
information will be lost. 

Alternatively, a new function (hopefully with a better name than
getfullargspec :) could be 
made and getargspec could retain its API, but raise an error when
keyword-only arguments are 
present.

def getargspec(func):
  args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann =
getfullargspec()
  if kwonlyargs:
     raise ValueError, "function has keyword-only arguments, use
getfullargspec!"
  return args, varargs, varkw, defaults

I'll update the patch to fix getargvalues and DocXMLRPCServer this
weekend.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 05:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Well, it depends on the context whether that matters.  The kw-only args
could just be included in the positional args (which have names anyway) and
that wouldn't be so bad for some apps.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 05:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 06:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 03:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 01:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 01:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 00:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 19:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 15:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 21:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 20:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 20:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 22:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 18:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 09:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 08:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-20 01:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-04 01:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 02:50:18 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 17:50:18 -0800
Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for
	MINGW
Message-ID: <E1H3NAw-0001J4-Ms@sc8-sf-web5.sourceforge.net>

Patches item #1597850, was opened at 2006-11-16 16:57
Message generated for change (Comment added) made by rmt38
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Han-Wen Nienhuys (hanwen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Cross compiling patches for MINGW

Initial Comment:
Hello, 

attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python.  The patches were originally written by Jan Nieuwenhuizen, my codeveloper.

These patches have been tested with Linux/x86, linux/x64 and macos 10.3  as build host and   linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at  lilypond.org/install/ except for darwin contain the x-compiled python.

Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme.

It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion.

Basically, what we do is override autoconf and Makefile settings through setting enviroment variables.

**README section** 


Cross Compiling
---------------

Python can be cross compiled by supplying different --build and --host
parameters to configure.  Python is compiled on the "build" system and
executed on the "host" system.  Cross compiling python requires a
native Python on the build host, and a natively compiled tool `Pgen'.

Before cross compiling, Python must first be compiled and installed on
the build host.  The configure script will use `cc' and `python', or
environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg:

   CC_FOR_BUILD=gcc-3.3 \
   PYTHON_FOR_BUILD=python2.4 \
   .../configure --build=i686-linux --host=i586-mingw32

Cross compiling has been tested under linux, mileage may vary for
other platforms.

A few reminders on using configure to cross compile:
- Cross compile tools must be in PATH,
- Cross compile tools must be prefixed with the host type
  (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...),
- CC, CXX, AR, and RANLIB must be undefined when running configure,
  they will be auto-detected.

If you need a cross compiler, Debian ships several several (eg: avr,
m68hc1x, mingw32), while dpkg-cross easily creates others.  Otherwise,
check out Dan Kegel's crosstool: http://www.kegel.com/crosstool .


----------------------------------------------------------------------

Comment By: Richard Tew (rmt38)
Date: 2007-01-07 01:50

Message:
Logged In: YES 
user_id=1417949
Originator: NO

This:

AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have
/dev/ptmx.]))

Is being translated into:

echo "$as_me:$LINENO: checking for /dev/ptmx" >&5
echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6
if test "${ac_cv_file__dev_ptmx+set}" = set; then
  echo $ECHO_N "(cached) $ECHO_C" >&6
else
  test "$cross_compiling" = yes &&
  { { echo "$as_me:$LINENO: error: cannot check for file existence when
cross compiling" >&5
echo "$as_me: error: cannot check for file existence when cross compiling"
>&2;}
   { (exit 1); exit 1; }; }
if test -r "/dev/ptmx"; then
  ac_cv_file__dev_ptmx=yes
else
  ac_cv_file__dev_ptmx=no
fi
fi

Which exits when I do:

$ export CC_FOR_BUILD=gcc
$ sh configure --host=arm-eabi

With an error like:

checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling

I am using the latest version of msys/mingw with devkitarm to cross
compile.  Is this supposed to happen?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:50

Message:
Logged In: YES 
user_id=161998
Originator: YES

this is a patch against a SVN checkout of last week.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:48

Message:
Logged In: YES 
user_id=161998
Originator: YES

With cross.patch I've been able to build a working freebsd
python on linux.

Since you had little problems with the X-compile patches, I'm
resubmitting those first.  I'd like to give our (admittedly: oddball)
mingw version another go when the X-compile patches are in python SVN.

Regarding your comments:

* what would be a better to import the SO setting?

the most reliable way to get something out of a makefile into python is

  VAR=foo
  export VAR
  .. 
  os.environ['VAR']

this doesn't introduce any fragility in parsing/expanding/(un)quoting, so
it's
actually pretty good.

Right now, I'm overriding sysconfig wholesale in setup.py with a

  sysconfig._config_vars.update (os.environ)

but I'm not sure that this affects the settings in build_ext.py.
A freebsd -> linux compile does not touch that code, so
if you dislike it, we can leave it out.

* I've documented the .x  extension

File Added: cross.patch

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:12

Message:
Logged In: YES 
user_id=21627
Originator: NO

One more note: it would be best if the patches were against the subversion
trunk. They won't be included in the 2.5 maintenance branch (as they are a
new feature), so they need to be ported to the trunk, anyway.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:06

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'll add my comments as I go through the patches.

cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc:
- why is there now a mingw32msvc2 platform? If the target is mingw (rather
than Cygwin), I'd expect that the target is just Win32/Windows, and that
all symbolic constants provided be usable across all Win32 Pythons.
- why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a
target header file?
- please include any plat-* files that you generate in the patch.
- why do you need dl_nt.c in Modules? Please make it use the one from PC
(consider updating the comment about
calling initall)

b52dbbbbc3adece61496b161d8c22599caae2311
- please combine all patches adding support for __MINGW32__ into a single
one. Why is anything needed here at all? I thought Python compiles already
with mingw32 (on Windows)?
- what is the exclusion of freezing for?

059af829d362b10bb5921367c93a56dbb51ef31b
- Why are you taking timeval from winsock2.h? It should come from
sys/time.h, and does in my copy of Debian mingw32-runtime.

6a742fb15b28564f9a1bc916c76a28dc672a9b2c
- Why are these changes needed? It's Windows, and that is already
supported.

a838b4780998ef98ae4880c3916274d45b661c82
- Why doesn't that already work on Windows+cygwin+mingw32?

f452fe4b95085d8c1ba838bf302a6a48df3c1d31
- I think this should target msvcr71.dll, not msvcrt.dll

Please also combine the cross-compilation patches into a single one.
- there is no need to provide pyconfig.h.in changes; I'll regenerate that,
anyway.


9c022e407c366c9f175e9168542ccc76eae9e3f0
- please integrate those into the large AC_CHECK_FUNCS that already
exists


540684d696df6057ee2c9c4e13e33fe450605ffa
- Why are you stripping -Wl?

64f5018e975419b2d37c39f457c8732def3288df
- Try getting SO from the Makefile, not from the environment (I assume
this is also meant to support true distutils packages some day).

7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c
- again: what is the "mingw" platform?

7d3a45788a0d83608d10e5c0a34f08b426d62e92
- is this really necessary? I suggest to drop it

23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea
- don't include bits/time.h; it's not meant for direct inclusion

6689ca9dea07afbe8a77b7787a5c4e1642f803a1
- what's a .x file?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-25 15:12

Message:
Logged In: YES 
user_id=161998
Originator: YES

I've sent the agreement by snailmail.


----------------------------------------------------------------------

Comment By: Jan Nieuwenhuizen (janneke-sf)
Date: 2006-11-17 19:57

Message:
Logged In: YES 
user_id=1368960
Originator: NO

I do not mind either.  I've just signed and faxed contrib-form.html.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:33

Message:
Logged In: YES 
user_id=161998
Originator: YES

note that not all of the patch needs to go in its current form. In
particular, setup.py should be
much more clever to look into build-root for finding libs and include
files.
 

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:32

Message:
Logged In: YES 
user_id=161998
Originator: YES

I don't mind, and I expect Jan won't have a problem either.

What's the procedure: do we send the disclaimer first, or do you do the
review, or does everything happen in parallel?

 
----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-11-16 21:47

Message:
Logged In: YES 
user_id=21627
Originator: NO

Would you and Jan Nieuwenhuizen be willing to sign the contributor
agreement, at

http://www.python.org/psf/contrib.html

I haven't reviewed the patch yet; if they can be integrated, that will
only happen in the trunk (i.e. not for 2.5.x).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 03:37:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 18:37:11 -0800
Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for
	MINGW
Message-ID: <E1H3NuJ-0005ai-Sp@sc8-sf-web11.sourceforge.net>

Patches item #1597850, was opened at 2006-11-16 16:57
Message generated for change (Comment added) made by hanwen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Han-Wen Nienhuys (hanwen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Cross compiling patches for MINGW

Initial Comment:
Hello, 

attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python.  The patches were originally written by Jan Nieuwenhuizen, my codeveloper.

These patches have been tested with Linux/x86, linux/x64 and macos 10.3  as build host and   linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at  lilypond.org/install/ except for darwin contain the x-compiled python.

Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme.

It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion.

Basically, what we do is override autoconf and Makefile settings through setting enviroment variables.

**README section** 


Cross Compiling
---------------

Python can be cross compiled by supplying different --build and --host
parameters to configure.  Python is compiled on the "build" system and
executed on the "host" system.  Cross compiling python requires a
native Python on the build host, and a natively compiled tool `Pgen'.

Before cross compiling, Python must first be compiled and installed on
the build host.  The configure script will use `cc' and `python', or
environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg:

   CC_FOR_BUILD=gcc-3.3 \
   PYTHON_FOR_BUILD=python2.4 \
   .../configure --build=i686-linux --host=i586-mingw32

Cross compiling has been tested under linux, mileage may vary for
other platforms.

A few reminders on using configure to cross compile:
- Cross compile tools must be in PATH,
- Cross compile tools must be prefixed with the host type
  (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...),
- CC, CXX, AR, and RANLIB must be undefined when running configure,
  they will be auto-detected.

If you need a cross compiler, Debian ships several several (eg: avr,
m68hc1x, mingw32), while dpkg-cross easily creates others.  Otherwise,
check out Dan Kegel's crosstool: http://www.kegel.com/crosstool .


----------------------------------------------------------------------

>Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-07 02:37

Message:
Logged In: YES 
user_id=161998
Originator: YES

"checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling"

You need to set up a config.cache file that contains 
the correct entry for 
  
  ac_cv_file__dev_ptmx 


----------------------------------------------------------------------

Comment By: Richard Tew (rmt38)
Date: 2007-01-07 01:50

Message:
Logged In: YES 
user_id=1417949
Originator: NO

This:

AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have
/dev/ptmx.]))

Is being translated into:

echo "$as_me:$LINENO: checking for /dev/ptmx" >&5
echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6
if test "${ac_cv_file__dev_ptmx+set}" = set; then
  echo $ECHO_N "(cached) $ECHO_C" >&6
else
  test "$cross_compiling" = yes &&
  { { echo "$as_me:$LINENO: error: cannot check for file existence when
cross compiling" >&5
echo "$as_me: error: cannot check for file existence when cross compiling"
>&2;}
   { (exit 1); exit 1; }; }
if test -r "/dev/ptmx"; then
  ac_cv_file__dev_ptmx=yes
else
  ac_cv_file__dev_ptmx=no
fi
fi

Which exits when I do:

$ export CC_FOR_BUILD=gcc
$ sh configure --host=arm-eabi

With an error like:

checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling

I am using the latest version of msys/mingw with devkitarm to cross
compile.  Is this supposed to happen?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:50

Message:
Logged In: YES 
user_id=161998
Originator: YES

this is a patch against a SVN checkout of last week.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:48

Message:
Logged In: YES 
user_id=161998
Originator: YES

With cross.patch I've been able to build a working freebsd
python on linux.

Since you had little problems with the X-compile patches, I'm
resubmitting those first.  I'd like to give our (admittedly: oddball)
mingw version another go when the X-compile patches are in python SVN.

Regarding your comments:

* what would be a better to import the SO setting?

the most reliable way to get something out of a makefile into python is

  VAR=foo
  export VAR
  .. 
  os.environ['VAR']

this doesn't introduce any fragility in parsing/expanding/(un)quoting, so
it's
actually pretty good.

Right now, I'm overriding sysconfig wholesale in setup.py with a

  sysconfig._config_vars.update (os.environ)

but I'm not sure that this affects the settings in build_ext.py.
A freebsd -> linux compile does not touch that code, so
if you dislike it, we can leave it out.

* I've documented the .x  extension

File Added: cross.patch

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:12

Message:
Logged In: YES 
user_id=21627
Originator: NO

One more note: it would be best if the patches were against the subversion
trunk. They won't be included in the 2.5 maintenance branch (as they are a
new feature), so they need to be ported to the trunk, anyway.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:06

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'll add my comments as I go through the patches.

cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc:
- why is there now a mingw32msvc2 platform? If the target is mingw (rather
than Cygwin), I'd expect that the target is just Win32/Windows, and that
all symbolic constants provided be usable across all Win32 Pythons.
- why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a
target header file?
- please include any plat-* files that you generate in the patch.
- why do you need dl_nt.c in Modules? Please make it use the one from PC
(consider updating the comment about
calling initall)

b52dbbbbc3adece61496b161d8c22599caae2311
- please combine all patches adding support for __MINGW32__ into a single
one. Why is anything needed here at all? I thought Python compiles already
with mingw32 (on Windows)?
- what is the exclusion of freezing for?

059af829d362b10bb5921367c93a56dbb51ef31b
- Why are you taking timeval from winsock2.h? It should come from
sys/time.h, and does in my copy of Debian mingw32-runtime.

6a742fb15b28564f9a1bc916c76a28dc672a9b2c
- Why are these changes needed? It's Windows, and that is already
supported.

a838b4780998ef98ae4880c3916274d45b661c82
- Why doesn't that already work on Windows+cygwin+mingw32?

f452fe4b95085d8c1ba838bf302a6a48df3c1d31
- I think this should target msvcr71.dll, not msvcrt.dll

Please also combine the cross-compilation patches into a single one.
- there is no need to provide pyconfig.h.in changes; I'll regenerate that,
anyway.


9c022e407c366c9f175e9168542ccc76eae9e3f0
- please integrate those into the large AC_CHECK_FUNCS that already
exists


540684d696df6057ee2c9c4e13e33fe450605ffa
- Why are you stripping -Wl?

64f5018e975419b2d37c39f457c8732def3288df
- Try getting SO from the Makefile, not from the environment (I assume
this is also meant to support true distutils packages some day).

7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c
- again: what is the "mingw" platform?

7d3a45788a0d83608d10e5c0a34f08b426d62e92
- is this really necessary? I suggest to drop it

23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea
- don't include bits/time.h; it's not meant for direct inclusion

6689ca9dea07afbe8a77b7787a5c4e1642f803a1
- what's a .x file?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-25 15:12

Message:
Logged In: YES 
user_id=161998
Originator: YES

I've sent the agreement by snailmail.


----------------------------------------------------------------------

Comment By: Jan Nieuwenhuizen (janneke-sf)
Date: 2006-11-17 19:57

Message:
Logged In: YES 
user_id=1368960
Originator: NO

I do not mind either.  I've just signed and faxed contrib-form.html.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:33

Message:
Logged In: YES 
user_id=161998
Originator: YES

note that not all of the patch needs to go in its current form. In
particular, setup.py should be
much more clever to look into build-root for finding libs and include
files.
 

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:32

Message:
Logged In: YES 
user_id=161998
Originator: YES

I don't mind, and I expect Jan won't have a problem either.

What's the procedure: do we send the disclaimer first, or do you do the
review, or does everything happen in parallel?

 
----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-11-16 21:47

Message:
Logged In: YES 
user_id=21627
Originator: NO

Would you and Jan Nieuwenhuizen be willing to sign the contributor
agreement, at

http://www.python.org/psf/contrib.html

I haven't reviewed the patch yet; if they can be integrated, that will
only happen in the trunk (i.e. not for 2.5.x).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 03:37:51 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 18:37:51 -0800
Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for
	MINGW
Message-ID: <E1H3Nux-0005bO-9y@sc8-sf-web11.sourceforge.net>

Patches item #1597850, was opened at 2006-11-16 16:57
Message generated for change (Comment added) made by hanwen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Han-Wen Nienhuys (hanwen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Cross compiling patches for MINGW

Initial Comment:
Hello, 

attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python.  The patches were originally written by Jan Nieuwenhuizen, my codeveloper.

These patches have been tested with Linux/x86, linux/x64 and macos 10.3  as build host and   linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at  lilypond.org/install/ except for darwin contain the x-compiled python.

Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme.

It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion.

Basically, what we do is override autoconf and Makefile settings through setting enviroment variables.

**README section** 


Cross Compiling
---------------

Python can be cross compiled by supplying different --build and --host
parameters to configure.  Python is compiled on the "build" system and
executed on the "host" system.  Cross compiling python requires a
native Python on the build host, and a natively compiled tool `Pgen'.

Before cross compiling, Python must first be compiled and installed on
the build host.  The configure script will use `cc' and `python', or
environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg:

   CC_FOR_BUILD=gcc-3.3 \
   PYTHON_FOR_BUILD=python2.4 \
   .../configure --build=i686-linux --host=i586-mingw32

Cross compiling has been tested under linux, mileage may vary for
other platforms.

A few reminders on using configure to cross compile:
- Cross compile tools must be in PATH,
- Cross compile tools must be prefixed with the host type
  (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...),
- CC, CXX, AR, and RANLIB must be undefined when running configure,
  they will be auto-detected.

If you need a cross compiler, Debian ships several several (eg: avr,
m68hc1x, mingw32), while dpkg-cross easily creates others.  Otherwise,
check out Dan Kegel's crosstool: http://www.kegel.com/crosstool .


----------------------------------------------------------------------

>Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-07 02:37

Message:
Logged In: YES 
user_id=161998
Originator: YES

"checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling"

You need to set up a config.cache file that contains 
the correct entry for 
  
  ac_cv_file__dev_ptmx 


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-07 02:37

Message:
Logged In: YES 
user_id=161998
Originator: YES

"checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling"

You need to set up a config.cache file that contains 
the correct entry for 
  
  ac_cv_file__dev_ptmx 


----------------------------------------------------------------------

Comment By: Richard Tew (rmt38)
Date: 2007-01-07 01:50

Message:
Logged In: YES 
user_id=1417949
Originator: NO

This:

AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have
/dev/ptmx.]))

Is being translated into:

echo "$as_me:$LINENO: checking for /dev/ptmx" >&5
echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6
if test "${ac_cv_file__dev_ptmx+set}" = set; then
  echo $ECHO_N "(cached) $ECHO_C" >&6
else
  test "$cross_compiling" = yes &&
  { { echo "$as_me:$LINENO: error: cannot check for file existence when
cross compiling" >&5
echo "$as_me: error: cannot check for file existence when cross compiling"
>&2;}
   { (exit 1); exit 1; }; }
if test -r "/dev/ptmx"; then
  ac_cv_file__dev_ptmx=yes
else
  ac_cv_file__dev_ptmx=no
fi
fi

Which exits when I do:

$ export CC_FOR_BUILD=gcc
$ sh configure --host=arm-eabi

With an error like:

checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling

I am using the latest version of msys/mingw with devkitarm to cross
compile.  Is this supposed to happen?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:50

Message:
Logged In: YES 
user_id=161998
Originator: YES

this is a patch against a SVN checkout of last week.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:48

Message:
Logged In: YES 
user_id=161998
Originator: YES

With cross.patch I've been able to build a working freebsd
python on linux.

Since you had little problems with the X-compile patches, I'm
resubmitting those first.  I'd like to give our (admittedly: oddball)
mingw version another go when the X-compile patches are in python SVN.

Regarding your comments:

* what would be a better to import the SO setting?

the most reliable way to get something out of a makefile into python is

  VAR=foo
  export VAR
  .. 
  os.environ['VAR']

this doesn't introduce any fragility in parsing/expanding/(un)quoting, so
it's
actually pretty good.

Right now, I'm overriding sysconfig wholesale in setup.py with a

  sysconfig._config_vars.update (os.environ)

but I'm not sure that this affects the settings in build_ext.py.
A freebsd -> linux compile does not touch that code, so
if you dislike it, we can leave it out.

* I've documented the .x  extension

File Added: cross.patch

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:12

Message:
Logged In: YES 
user_id=21627
Originator: NO

One more note: it would be best if the patches were against the subversion
trunk. They won't be included in the 2.5 maintenance branch (as they are a
new feature), so they need to be ported to the trunk, anyway.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:06

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'll add my comments as I go through the patches.

cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc:
- why is there now a mingw32msvc2 platform? If the target is mingw (rather
than Cygwin), I'd expect that the target is just Win32/Windows, and that
all symbolic constants provided be usable across all Win32 Pythons.
- why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a
target header file?
- please include any plat-* files that you generate in the patch.
- why do you need dl_nt.c in Modules? Please make it use the one from PC
(consider updating the comment about
calling initall)

b52dbbbbc3adece61496b161d8c22599caae2311
- please combine all patches adding support for __MINGW32__ into a single
one. Why is anything needed here at all? I thought Python compiles already
with mingw32 (on Windows)?
- what is the exclusion of freezing for?

059af829d362b10bb5921367c93a56dbb51ef31b
- Why are you taking timeval from winsock2.h? It should come from
sys/time.h, and does in my copy of Debian mingw32-runtime.

6a742fb15b28564f9a1bc916c76a28dc672a9b2c
- Why are these changes needed? It's Windows, and that is already
supported.

a838b4780998ef98ae4880c3916274d45b661c82
- Why doesn't that already work on Windows+cygwin+mingw32?

f452fe4b95085d8c1ba838bf302a6a48df3c1d31
- I think this should target msvcr71.dll, not msvcrt.dll

Please also combine the cross-compilation patches into a single one.
- there is no need to provide pyconfig.h.in changes; I'll regenerate that,
anyway.


9c022e407c366c9f175e9168542ccc76eae9e3f0
- please integrate those into the large AC_CHECK_FUNCS that already
exists


540684d696df6057ee2c9c4e13e33fe450605ffa
- Why are you stripping -Wl?

64f5018e975419b2d37c39f457c8732def3288df
- Try getting SO from the Makefile, not from the environment (I assume
this is also meant to support true distutils packages some day).

7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c
- again: what is the "mingw" platform?

7d3a45788a0d83608d10e5c0a34f08b426d62e92
- is this really necessary? I suggest to drop it

23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea
- don't include bits/time.h; it's not meant for direct inclusion

6689ca9dea07afbe8a77b7787a5c4e1642f803a1
- what's a .x file?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-25 15:12

Message:
Logged In: YES 
user_id=161998
Originator: YES

I've sent the agreement by snailmail.


----------------------------------------------------------------------

Comment By: Jan Nieuwenhuizen (janneke-sf)
Date: 2006-11-17 19:57

Message:
Logged In: YES 
user_id=1368960
Originator: NO

I do not mind either.  I've just signed and faxed contrib-form.html.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:33

Message:
Logged In: YES 
user_id=161998
Originator: YES

note that not all of the patch needs to go in its current form. In
particular, setup.py should be
much more clever to look into build-root for finding libs and include
files.
 

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:32

Message:
Logged In: YES 
user_id=161998
Originator: YES

I don't mind, and I expect Jan won't have a problem either.

What's the procedure: do we send the disclaimer first, or do you do the
review, or does everything happen in parallel?

 
----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-11-16 21:47

Message:
Logged In: YES 
user_id=21627
Originator: NO

Would you and Jan Nieuwenhuizen be willing to sign the contributor
agreement, at

http://www.python.org/psf/contrib.html

I haven't reviewed the patch yet; if they can be integrated, that will
only happen in the trunk (i.e. not for 2.5.x).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 05:42:22 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 20:42:22 -0800
Subject: [Patches] [ python-Patches-909005 ] asyncore fixes and improvements
Message-ID: <E1H3PrS-0006kh-FG@sc8-sf-web2.sourceforge.net>

Patches item #909005, was opened at 2004-03-03 05:07
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexey Klimkin (klimkin)
Assigned to: A.M. Kuchling (akuchling)
Summary: asyncore fixes and improvements

Initial Comment:
Minor:
* 0/1 for boolean values replaced with False/True.
* (887279) Added handling of POLLPRI as POLLIN.
POLLERR, POLLHUP,
POLLNVAL are handled as exception event.
handle_expt_event gets recent
error from self.socket object and raises socket.error.
* Default readable()/writable() returns False.
* Added "map" parameter for file_dispatcher.
* file_wrapper: removed "return" in close(), recv/read
and send/write
swapped because of their nature.
* mac code for writable() removed. Manual for accept()
on mac is similar
to the one on linux.
* Repeating exception changed from "raise socket.error,
why" to raise.
* Added connected/accepting/addr reset on close().
Initialization of
variables moved to __init__.
* close_all() now calls close for dispatcher object,
EBADF treated
as already closed socket/file.
* Added channel id to "unhandled..." messages.

Bugs:
* Fixed bug (654766,889153): client never gets
connected, nor errored.
Connecting client gets writable event from select(),
however, some client may want always be non writable.
Such client may
never get connected. The fix adds _readable() - always
True for
accepting and always False for connecting socket; and
_writable() -
always False for accepting and always True for
connecting socket.
This implies, that listening dispatcher's readable()
and writable()
will never be called. ("man accept" and "man connect"
for non-blocking
sockets).
* Fixed bug: error handling after accept().
It's said, that accept can return EWOULDBLOCK even for
readable socket.
This mean, that even after handle_accept(),
dispatcher's accept() still
raise EWOULDBLOCK. New code does accept() itself and
stores accepted
socket in self.__pending_accept. If there was
socket.error, it's treated
as EWOULDBLOCK. dispatcher's accept returns
self.__pending_accept and
resets it to None.

Features:
* Added pending_read() and pending_write(). The
functions helps to use
dispatcher over non socket objects with buffering
capabilities. In original
dispatcher, if socket makes buffered read and some data
is in buffer, entering
asyncore.poll() doesn't finishes, since there is no
data in real file/socket.
This feature allow to use SSL socket, since the socket
reads data by 16k chunks.


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-06 20:42

Message:
Logged In: YES 
user_id=341410
Originator: NO

Many of the changes in the source provided by klimkin in his most recent
revision from February 27, 2005 seek to solve certain problems in an
inconsistent or incorrect way.  Some of his changes (or variants thereof)
are worthwhile.  I'll start with my issues with his asyncore changes, then
describe what I think should be added from them.

For example, in his updated asyncore.py, the list of sockets is first
shuffled randomly, then sorted based on priority.  Assuming that one
ignored priorities for a moment, if there were more sockets than the max
sockets for the platform, then due to the limitations of randomness, there
would be no guarantees that all sockets would get polled.  Say, for
example, that one were using windows and were running close to the actual
select file handle limit (512 in Python 2.3) with 500 handles, you would
skip 436 of the sockets *this pass*.  In 10 passes, there would have been
100 sockets that were never polled.  In 20 passes, there would still be, on
average, 20 that were never polled.  So this "randomization" step is the
wrong thing to do, unless you actually make multiple select calls for each
poll() call.  But really, select is limited by 512, and I've run it with
500 without issue.

The priority based sorting has much of the same problems, but it is even
worse when you have nontrivial numbers of differing priorities, regardless
of randomization or not.

The max socket limit of 64 on Windows isn't correct.  It's been 512 since
at least Python 2.3 .  And all other platforms being 65536?  No.  I've had
some versions of linux die on me at 512, others at 4096, but all were dog
slow beyond 500 or so.  It's better to let the underlying system raise an
exception for the user when it fails and let them attempt to tune it,
rather than forcing a tuning that may not be correct.


The "pending read" stuff is also misdirected.  Assuming a non-broken async
client or server, either should be handling content as it comes it,
dispatching as necessary.  See asynchat.collect_incoming_data() and
asynchat.found_terminator() for examples.

The idispatcher stuff seems unnecessary.


Generally speaking, it seems to me that there are 3 levels of abstraction
going on:
1) handle_*_event(), called by poll, poll2, etc.
2) handle_*(), called by handle_*_event(), user overrides, calls other
handle_*() and *() methods
3) *() (aka recv, send, close, etc.), called by handle_*(), generally left
alone.

Some of your code breaks the abstraction and has items in layer 2 call
items in layer 1, which then call items in layer 2 again.  This seems
unnecessary, and breaks the general downward calling semantic (except in
the case of errors returned by layer 3 resulting in layer 2 handle_close()
calls, which is the proper method to call).


There are, according to my reading of the asyncore portions of your
included module, a few things that may be worthy for inclusion into the
Python standard library are:

* A variant of your changes to close_all(), though it should proceed in
closing everything unless a KeyboardInterrupt, SystemExit, or ExitNow
exception is raised.  Socket errors should be ignored, because we are
closing them - we don't care about their error condition.

* Checking sockets for socket error via socket.getsockopt() .

* A variant of your .close() implementation.

* The CONNRESET, etc., stuff in the send() and recv() methods, but not the
handle_close_event() replacements, stick with handle_close() .

* Checking for KeyboardInterrupt and SystemExit inside the poll
functions.

* The _closed_socket class and initialization.

All but the last of the above, I would consider to be bugfixes, and if
others agree that these are reasonable changes, I'll write up a patch
against trunk and 2.5 maintenance.  The last change, while I think would be
nice, probably shouldn't be included in 2.5 maintenance, though I think
would be fine for the trunk.

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2005-02-26 13:39

Message:
Logged In: YES 
user_id=410460

Minor improvements:

    * Added handle_close_event(): calls handle_close(), then 
closes channel. No need to write self.close() in each handle_close
().

    * Improved exception handling. KeyboardInterrupt is not 
blocked. For python exception handle_error_event() is called, 
which checks for KeyboardInterrupt and closes socket, if 
handle_error didn't.

Bugs:

    * Calling connect() could raise exception and doesn't hit 
handle_error(). Now if there was an exception, 
handle_error_event() is called.

Features:

    * set_timeout(): Sets timeout for dispatcher object, if there was 
no io for the object, raises ETIMEDOUT, which handled by 
handle_error_event().

    * Fixed issue with Windows - too many descriptors in select(). 
The list of sockets shuffled and only first asyncore.max_channels 
used in select().

    * Added set_prio(): Sets priority for dispatcher.  After shuffle 
the list of sockets sorted by priority.


You may also check asynhttplib - asynchronous version of httplib.


----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-07-02 06:44

Message:
Logged In: YES 
user_id=410460

In addition to "[ 909005 ] asyncore fixes and improvements"
and CVS
version "asyncore.py,v 2.51" this patch provides:

* Added handling of buffered socket layer (pending_read(),
  pending_write()).
* Added fd number for __repr__.
* Initialized self.socket = socket._closedsocket() instead
of None
  for verbose error output (like closed socket.socket).
* asyncore and asynchat implements idispatcher and iasync_chat.
* Fixed self.addr initialization.
* Removed import exceptions.
* Don't filter KeyboardInterrupt, just pass through.
* Added queue of sockets, solves the problem of select() on
too many
  descriptors.

I have run make test in python cvs distrib without problems.
Examples of using i* included.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-06-05 10:54

Message:
Logged In: YES 
user_id=11375

I've struggled to get the test suite running without errors on my machine,

but have failed.  

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-03-21 22:15

Message:
Logged In: YES 
user_id=410460

There is no real reason for this change, please undo.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:18

Message:
Logged In: YES 
user_id=11375

In your version of file_dispatch.__init__, the .set_file() call is 
moved earlier; can you say why?


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:13

Message:
Logged In: YES 
user_id=11375

Added "map" parameter for file_dispatcher and 
dispatcher_with_send in CVS HEAD.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:08

Message:
Logged In: YES 
user_id=11375

Repeating exception changes ('raise socket.error' -> just 'raise')
checked into HEAD.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:02

Message:
Logged In: YES 
user_id=11375

Mac code for writable() removed from HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:02

Message:
Logged In: YES 
user_id=11375

Patch to use True/False applied to HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 11:55

Message:
Logged In: YES 
user_id=11375

Fix for bug #887279 applied to HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 11:48

Message:
Logged In: YES 
user_id=11375

The many number of changes in this patch make it difficult to 
figure out which changes fix which problem.  I've created a new 
directory in CVS, nondist/sandbox/asyncore, that contains copies of 
the module with these patches applied, and will work on applying 
changes to the copy in dist/src.

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-03-16 23:15

Message:
Logged In: YES 
user_id=410460

Sorry, unfortunately I have lost old patch file. I have
atached new one.
In addition to fixes, listed above, the patch includes:

1. Fix for operating on uninitialized socket. self.socket
now initializes with _closed_socket(), so any operation
throws EBADF.
2. Added class idispatcher - base class for dispatcher. The
purpose of this class is to allow simple replacement of
media(dispatcher interface) in classes, derived from
dispatcher class. This is based on 'object'.

I have also attached asynchat.diff - example for new-style
dispatcher. Old asynchat works as well.


----------------------------------------------------------------------

Comment By: Wummel (calvin)
Date: 2004-03-11 07:49

Message:
Logged In: YES 
user_id=9205

There is no file attached! You have to click on the checkbox
next to the upload filename. This is a Sourceforge annoyance :(

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 05:53:55 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 20:53:55 -0800
Subject: [Patches] [ python-Patches-909005 ] asyncore fixes and improvements
Message-ID: <E1H3Q2d-0007Me-GC@sc8-sf-web6.sourceforge.net>

Patches item #909005, was opened at 2004-03-03 05:07
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexey Klimkin (klimkin)
Assigned to: A.M. Kuchling (akuchling)
Summary: asyncore fixes and improvements

Initial Comment:
Minor:
* 0/1 for boolean values replaced with False/True.
* (887279) Added handling of POLLPRI as POLLIN.
POLLERR, POLLHUP,
POLLNVAL are handled as exception event.
handle_expt_event gets recent
error from self.socket object and raises socket.error.
* Default readable()/writable() returns False.
* Added "map" parameter for file_dispatcher.
* file_wrapper: removed "return" in close(), recv/read
and send/write
swapped because of their nature.
* mac code for writable() removed. Manual for accept()
on mac is similar
to the one on linux.
* Repeating exception changed from "raise socket.error,
why" to raise.
* Added connected/accepting/addr reset on close().
Initialization of
variables moved to __init__.
* close_all() now calls close for dispatcher object,
EBADF treated
as already closed socket/file.
* Added channel id to "unhandled..." messages.

Bugs:
* Fixed bug (654766,889153): client never gets
connected, nor errored.
Connecting client gets writable event from select(),
however, some client may want always be non writable.
Such client may
never get connected. The fix adds _readable() - always
True for
accepting and always False for connecting socket; and
_writable() -
always False for accepting and always True for
connecting socket.
This implies, that listening dispatcher's readable()
and writable()
will never be called. ("man accept" and "man connect"
for non-blocking
sockets).
* Fixed bug: error handling after accept().
It's said, that accept can return EWOULDBLOCK even for
readable socket.
This mean, that even after handle_accept(),
dispatcher's accept() still
raise EWOULDBLOCK. New code does accept() itself and
stores accepted
socket in self.__pending_accept. If there was
socket.error, it's treated
as EWOULDBLOCK. dispatcher's accept returns
self.__pending_accept and
resets it to None.

Features:
* Added pending_read() and pending_write(). The
functions helps to use
dispatcher over non socket objects with buffering
capabilities. In original
dispatcher, if socket makes buffered read and some data
is in buffer, entering
asyncore.poll() doesn't finishes, since there is no
data in real file/socket.
This feature allow to use SSL socket, since the socket
reads data by 16k chunks.


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-06 20:53

Message:
Logged In: YES 
user_id=341410
Originator: NO

In asynchat, the only stuff that should be accepted is the handle_read()
changes.  The deque removal should be ignored (we have deques since Python
2.4, which are *significantly* faster than lists in nontrivial
applications), the iasync_chat stuff, like the idispatcher stuff, seems
unnecessary.  And that's pretty much it for asynchat.

The proposed asynchttp module shouldn't go into the Python standard
library until it has lived on its own for a nontrival amount of time in the
Cheeseshop and is found to be as good as httplib, urllib, or urllib2.  Even
then, its inclusion should be questioned, as medusa (the http server based
on asyncore) has been around for a decade or more, is used many places, and
yet still isn't in the standard library.

The asyncoreTest.py needs a bit of work (I notice some incorrect names),
but could be used as an addition to the test suite (currently it seems as
though only asynchat is tested).

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-06 20:42

Message:
Logged In: YES 
user_id=341410
Originator: NO

Many of the changes in the source provided by klimkin in his most recent
revision from February 27, 2005 seek to solve certain problems in an
inconsistent or incorrect way.  Some of his changes (or variants thereof)
are worthwhile.  I'll start with my issues with his asyncore changes, then
describe what I think should be added from them.

For example, in his updated asyncore.py, the list of sockets is first
shuffled randomly, then sorted based on priority.  Assuming that one
ignored priorities for a moment, if there were more sockets than the max
sockets for the platform, then due to the limitations of randomness, there
would be no guarantees that all sockets would get polled.  Say, for
example, that one were using windows and were running close to the actual
select file handle limit (512 in Python 2.3) with 500 handles, you would
skip 436 of the sockets *this pass*.  In 10 passes, there would have been
100 sockets that were never polled.  In 20 passes, there would still be, on
average, 20 that were never polled.  So this "randomization" step is the
wrong thing to do, unless you actually make multiple select calls for each
poll() call.  But really, select is limited by 512, and I've run it with
500 without issue.

The priority based sorting has much of the same problems, but it is even
worse when you have nontrivial numbers of differing priorities, regardless
of randomization or not.

The max socket limit of 64 on Windows isn't correct.  It's been 512 since
at least Python 2.3 .  And all other platforms being 65536?  No.  I've had
some versions of linux die on me at 512, others at 4096, but all were dog
slow beyond 500 or so.  It's better to let the underlying system raise an
exception for the user when it fails and let them attempt to tune it,
rather than forcing a tuning that may not be correct.


The "pending read" stuff is also misdirected.  Assuming a non-broken async
client or server, either should be handling content as it comes it,
dispatching as necessary.  See asynchat.collect_incoming_data() and
asynchat.found_terminator() for examples.

The idispatcher stuff seems unnecessary.


Generally speaking, it seems to me that there are 3 levels of abstraction
going on:
1) handle_*_event(), called by poll, poll2, etc.
2) handle_*(), called by handle_*_event(), user overrides, calls other
handle_*() and *() methods
3) *() (aka recv, send, close, etc.), called by handle_*(), generally left
alone.

Some of your code breaks the abstraction and has items in layer 2 call
items in layer 1, which then call items in layer 2 again.  This seems
unnecessary, and breaks the general downward calling semantic (except in
the case of errors returned by layer 3 resulting in layer 2 handle_close()
calls, which is the proper method to call).


There are, according to my reading of the asyncore portions of your
included module, a few things that may be worthy for inclusion into the
Python standard library are:

* A variant of your changes to close_all(), though it should proceed in
closing everything unless a KeyboardInterrupt, SystemExit, or ExitNow
exception is raised.  Socket errors should be ignored, because we are
closing them - we don't care about their error condition.

* Checking sockets for socket error via socket.getsockopt() .

* A variant of your .close() implementation.

* The CONNRESET, etc., stuff in the send() and recv() methods, but not the
handle_close_event() replacements, stick with handle_close() .

* Checking for KeyboardInterrupt and SystemExit inside the poll
functions.

* The _closed_socket class and initialization.

All but the last of the above, I would consider to be bugfixes, and if
others agree that these are reasonable changes, I'll write up a patch
against trunk and 2.5 maintenance.  The last change, while I think would be
nice, probably shouldn't be included in 2.5 maintenance, though I think
would be fine for the trunk.

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2005-02-26 13:39

Message:
Logged In: YES 
user_id=410460

Minor improvements:

    * Added handle_close_event(): calls handle_close(), then 
closes channel. No need to write self.close() in each handle_close
().

    * Improved exception handling. KeyboardInterrupt is not 
blocked. For python exception handle_error_event() is called, 
which checks for KeyboardInterrupt and closes socket, if 
handle_error didn't.

Bugs:

    * Calling connect() could raise exception and doesn't hit 
handle_error(). Now if there was an exception, 
handle_error_event() is called.

Features:

    * set_timeout(): Sets timeout for dispatcher object, if there was 
no io for the object, raises ETIMEDOUT, which handled by 
handle_error_event().

    * Fixed issue with Windows - too many descriptors in select(). 
The list of sockets shuffled and only first asyncore.max_channels 
used in select().

    * Added set_prio(): Sets priority for dispatcher.  After shuffle 
the list of sockets sorted by priority.


You may also check asynhttplib - asynchronous version of httplib.


----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-07-02 06:44

Message:
Logged In: YES 
user_id=410460

In addition to "[ 909005 ] asyncore fixes and improvements"
and CVS
version "asyncore.py,v 2.51" this patch provides:

* Added handling of buffered socket layer (pending_read(),
  pending_write()).
* Added fd number for __repr__.
* Initialized self.socket = socket._closedsocket() instead
of None
  for verbose error output (like closed socket.socket).
* asyncore and asynchat implements idispatcher and iasync_chat.
* Fixed self.addr initialization.
* Removed import exceptions.
* Don't filter KeyboardInterrupt, just pass through.
* Added queue of sockets, solves the problem of select() on
too many
  descriptors.

I have run make test in python cvs distrib without problems.
Examples of using i* included.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-06-05 10:54

Message:
Logged In: YES 
user_id=11375

I've struggled to get the test suite running without errors on my machine,

but have failed.  

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-03-21 22:15

Message:
Logged In: YES 
user_id=410460

There is no real reason for this change, please undo.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:18

Message:
Logged In: YES 
user_id=11375

In your version of file_dispatch.__init__, the .set_file() call is 
moved earlier; can you say why?


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:13

Message:
Logged In: YES 
user_id=11375

Added "map" parameter for file_dispatcher and 
dispatcher_with_send in CVS HEAD.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:08

Message:
Logged In: YES 
user_id=11375

Repeating exception changes ('raise socket.error' -> just 'raise')
checked into HEAD.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:02

Message:
Logged In: YES 
user_id=11375

Mac code for writable() removed from HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 12:02

Message:
Logged In: YES 
user_id=11375

Patch to use True/False applied to HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 11:55

Message:
Logged In: YES 
user_id=11375

Fix for bug #887279 applied to HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 11:48

Message:
Logged In: YES 
user_id=11375

The many number of changes in this patch make it difficult to 
figure out which changes fix which problem.  I've created a new 
directory in CVS, nondist/sandbox/asyncore, that contains copies of 
the module with these patches applied, and will work on applying 
changes to the copy in dist/src.

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-03-16 23:15

Message:
Logged In: YES 
user_id=410460

Sorry, unfortunately I have lost old patch file. I have
atached new one.
In addition to fixes, listed above, the patch includes:

1. Fix for operating on uninitialized socket. self.socket
now initializes with _closed_socket(), so any operation
throws EBADF.
2. Added class idispatcher - base class for dispatcher. The
purpose of this class is to allow simple replacement of
media(dispatcher interface) in classes, derived from
dispatcher class. This is based on 'object'.

I have also attached asynchat.diff - example for new-style
dispatcher. Old asynchat works as well.


----------------------------------------------------------------------

Comment By: Wummel (calvin)
Date: 2004-03-11 07:49

Message:
Logged In: YES 
user_id=9205

There is no file attached! You have to click on the checkbox
next to the upload filename. This is a Sourceforge annoyance :(

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 06:08:32 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 21:08:32 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H3QGm-0000ym-OW@sc8-sf-web3.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 01:37
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-06 21:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 06:19:55 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 21:19:55 -0800
Subject: [Patches] [ python-Patches-1617702 ] extended slicing for buffer
	objects
Message-ID: <E1H3QRn-0004NS-Kx@sc8-sf-web11.sourceforge.net>

Patches item #1617702, was opened at 2006-12-17 20:45
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617702&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Nobody/Anonymous (nobody)
Summary: extended slicing for buffer objects

Initial Comment:
extended slicing support for buffer objects. Including slice assignment, but I don't know of a way to test assignment.
(Backported from p3yk-noslice branch.)


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-06 21:19

Message:
Logged In: YES 
user_id=341410
Originator: NO

As per current trunk source, read-write buffers can only be created via
the CPython API.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617702&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 06:51:17 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 06 Jan 2007 21:51:17 -0800
Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining
	on BINARY_SUBSCR
Message-ID: <E1H3Qw9-0007rb-Ix@sc8-sf-web8.sourceforge.net>

Patches item #1629718, was opened at 2007-01-07 14:51
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hirokazu Yamamoto (ocean-city)
Assigned to: Nobody/Anonymous (nobody)
Summary: fast tuple[index] by inlining on BINARY_SUBSCR

Initial Comment:
Hello.

I noticed there is speed difference between

  a = [0,] # list
  a[0] # fast

and

  a = (0,) # tuple
  a[0] # slow

while solving ICPC puzzle with Python.

I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them.

After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR).

Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 10:03:20 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 01:03:20 -0800
Subject: [Patches] [ python-Patches-1609282 ] #1603424 subprocess.py wrongly
	claims 2.2 compatibility.
Message-ID: <E1H3Tw0-0000iV-0Y@sc8-sf-web9.sourceforge.net>

Patches item #1609282, was opened at 2006-12-05 16:16
Message generated for change (Comment added) made by astrand
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1609282&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Robert Carr (racarr)
>Assigned to: Peter ?strand (astrand)
Summary: #1603424 subprocess.py wrongly claims 2.2 compatibility.

Initial Comment:
Simple fix restoring 2.2 compatibility in subprocess.py. This makes more sense than a list comprehension or constructing sets in my opinion even ignoring the bug. 

----------------------------------------------------------------------

>Comment By: Peter ?strand (astrand)
Date: 2007-01-07 10:03

Message:
Logged In: YES 
user_id=344921
Originator: NO

This patch is rejected, due to the problem described in gbrandl:s comment.
Another fix has been submitted, though, which solves bug #1603424. 

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-12-08 21:55

Message:
Logged In: YES 
user_id=849994
Originator: NO

This patch changes semantics: if two names refer to the same fd, it is
attempted to be closed twice.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1609282&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 11:58:43 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 02:58:43 -0800
Subject: [Patches] [ python-Patches-1603907 ] subprocess: error redirecting
	i/o from non-console process
Message-ID: <E1H3Vjf-0000TD-OB@sc8-sf-web6.sourceforge.net>

Patches item #1603907, was opened at 2006-11-27 18:20
Message generated for change (Comment added) made by astrand
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Oren Tirosh (orenti)
Assigned to: Peter ?strand (astrand)
Summary: subprocess: error redirecting i/o from non-console process 

Initial Comment:
In IDLE, PythonWin or other non-console interactive Python under Windows:

>>> from subprocess import *
>>> Popen('cmd', stdout=PIPE)

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in -toplevel-
    Popen('', stdout=PIPE)
  File "C:\python24\lib\subprocess.py", line 533, in __init__
    (p2cread, p2cwrite,
  File "C:\python24\lib\subprocess.py", line 593, in _get_handles
    p2cread = self._make_inheritable(p2cread)
  File "C:\python24\lib\subprocess.py", line 634, in _make_inheritable
    DUPLICATE_SAME_ACCESS)
TypeError: an integer is required

The same command in a console windows is successful.

Why it happens: 
subprocess assumes that GetStdHandle always succeeds but when there is no console it returns None. DuplicateHandle then complains about getting a non-integer. This problem does not happen when redirecting all three standard handles.

Solution:
Replace None with -1 (INVALID_HANDLE_VALUE) in _make_inheritable.

Patch attached.

----------------------------------------------------------------------

>Comment By: Peter ?strand (astrand)
Date: 2007-01-07 11:58

Message:
Logged In: YES 
user_id=344921
Originator: NO

This patch looks very interesting. However, it feels a little bit strange
to call DuplicateHandle with a handle of -1. Is this really allowed? What
will DuplicateHandle return in this case? INVALID_HANDLE_VALUE? In that
case, isn't it better to return INVALID_HANDLE_VALUE directly? 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 19:09:42 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 10:09:42 -0800
Subject: [Patches] [ python-Patches-1603907 ] subprocess: error redirecting
	i/o from non-console process
Message-ID: <E1H3cSk-0007EA-Kv@sc8-sf-web4.sourceforge.net>

Patches item #1603907, was opened at 2006-11-27 17:20
Message generated for change (Comment added) made by orenti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Oren Tirosh (orenti)
Assigned to: Peter ?strand (astrand)
Summary: subprocess: error redirecting i/o from non-console process 

Initial Comment:
In IDLE, PythonWin or other non-console interactive Python under Windows:

>>> from subprocess import *
>>> Popen('cmd', stdout=PIPE)

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in -toplevel-
    Popen('', stdout=PIPE)
  File "C:\python24\lib\subprocess.py", line 533, in __init__
    (p2cread, p2cwrite,
  File "C:\python24\lib\subprocess.py", line 593, in _get_handles
    p2cread = self._make_inheritable(p2cread)
  File "C:\python24\lib\subprocess.py", line 634, in _make_inheritable
    DUPLICATE_SAME_ACCESS)
TypeError: an integer is required

The same command in a console windows is successful.

Why it happens: 
subprocess assumes that GetStdHandle always succeeds but when there is no console it returns None. DuplicateHandle then complains about getting a non-integer. This problem does not happen when redirecting all three standard handles.

Solution:
Replace None with -1 (INVALID_HANDLE_VALUE) in _make_inheritable.

Patch attached.

----------------------------------------------------------------------

>Comment By: Oren Tirosh (orenti)
Date: 2007-01-07 18:09

Message:
Logged In: YES 
user_id=562624
Originator: YES

If you duplicate INVALID_HANDLE_VALUE you get a new valid handle to
nothing :-) I guess the code really should not rely on this undocumented
behavior. The reason I didn't return INVALID_HANDLE_VALUE directly is
because DuplicateHandle returns a _subprocess_handle object, not an int.
It's expected to have a .Close() method elsewhere in the code.

Because of subtle difference between in the behavior of the _subprocess
and win32api implementations of GetStdHandle in this case solving this
issue this gets quite messy!
File Added: subprocess-noconsole2.patch

----------------------------------------------------------------------

Comment By: Peter ?strand (astrand)
Date: 2007-01-07 10:58

Message:
Logged In: YES 
user_id=344921
Originator: NO

This patch looks very interesting. However, it feels a little bit strange
to call DuplicateHandle with a handle of -1. Is this really allowed? What
will DuplicateHandle return in this case? INVALID_HANDLE_VALUE? In that
case, isn't it better to return INVALID_HANDLE_VALUE directly? 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 19:13:19 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 10:13:19 -0800
Subject: [Patches] [ python-Patches-1603907 ] subprocess: error redirecting
	i/o from non-console process
Message-ID: <E1H3cWF-0002IL-GO@sc8-sf-web9.sourceforge.net>

Patches item #1603907, was opened at 2006-11-27 17:20
Message generated for change (Comment added) made by orenti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Oren Tirosh (orenti)
Assigned to: Peter ?strand (astrand)
Summary: subprocess: error redirecting i/o from non-console process 

Initial Comment:
In IDLE, PythonWin or other non-console interactive Python under Windows:

>>> from subprocess import *
>>> Popen('cmd', stdout=PIPE)

Traceback (most recent call last):
  File "<pyshell#11>", line 1, in -toplevel-
    Popen('', stdout=PIPE)
  File "C:\python24\lib\subprocess.py", line 533, in __init__
    (p2cread, p2cwrite,
  File "C:\python24\lib\subprocess.py", line 593, in _get_handles
    p2cread = self._make_inheritable(p2cread)
  File "C:\python24\lib\subprocess.py", line 634, in _make_inheritable
    DUPLICATE_SAME_ACCESS)
TypeError: an integer is required

The same command in a console windows is successful.

Why it happens: 
subprocess assumes that GetStdHandle always succeeds but when there is no console it returns None. DuplicateHandle then complains about getting a non-integer. This problem does not happen when redirecting all three standard handles.

Solution:
Replace None with -1 (INVALID_HANDLE_VALUE) in _make_inheritable.

Patch attached.

----------------------------------------------------------------------

>Comment By: Oren Tirosh (orenti)
Date: 2007-01-07 18:13

Message:
Logged In: YES 
user_id=562624
Originator: YES

Oops. The new patch does not solve it in all cases in the win32api
version, either...

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2007-01-07 18:09

Message:
Logged In: YES 
user_id=562624
Originator: YES

If you duplicate INVALID_HANDLE_VALUE you get a new valid handle to
nothing :-) I guess the code really should not rely on this undocumented
behavior. The reason I didn't return INVALID_HANDLE_VALUE directly is
because DuplicateHandle returns a _subprocess_handle object, not an int.
It's expected to have a .Close() method elsewhere in the code.

Because of subtle difference between in the behavior of the _subprocess
and win32api implementations of GetStdHandle in this case solving this
issue this gets quite messy!
File Added: subprocess-noconsole2.patch

----------------------------------------------------------------------

Comment By: Peter ?strand (astrand)
Date: 2007-01-07 10:58

Message:
Logged In: YES 
user_id=344921
Originator: NO

This patch looks very interesting. However, it feels a little bit strange
to call DuplicateHandle with a handle of -1. Is this really allowed? What
will DuplicateHandle return in this case? INVALID_HANDLE_VALUE? In that
case, isn't it better to return INVALID_HANDLE_VALUE directly? 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 19:24:24 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 10:24:24 -0800
Subject: [Patches] [ python-Patches-1628205 ] socket.readline() interface
	doesn't handle EINTR properly
Message-ID: <E1H3cgy-0000mI-QZ@sc8-sf-web6.sourceforge.net>

Patches item #1628205, was opened at 2007-01-04 21:37
Message generated for change (Comment added) made by orenti
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Maxim Sobolev (sobomax)
Assigned to: Nobody/Anonymous (nobody)
Summary: socket.readline() interface doesn't handle EINTR properly

Initial Comment:
The socket.readline() interface doesn't handle EINTR properly. Currently, when EINTR received exception is not handled and all data that has been in the buffer is lost. There is no way to recover that data from the code that uses the interface.

Correct behaviour would be to catch EINTR and restart recv(). Patch is attached.

Following is the real world example of how it affects httplib module:

  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1096, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1383, in __request
    verbose=self.__verbose
  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1131, in request
    errcode, errmsg, headers = h.getreply()
  File "/usr/local/lib/python2.4/httplib.py", line 1137, in getreply
    response = self._conn.getresponse()
  File "/usr/local/lib/python2.4/httplib.py", line 866, in getresponse
    response.begin()
  File "/usr/local/lib/python2.4/httplib.py", line 336, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python2.4/httplib.py", line 294, in _read_status
    line = self.fp.readline()
  File "/usr/local/lib/python2.4/socket.py", line 325, in readline
    data = recv(1)
error: (4, 'Interrupted system call')

-Maxim

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2007-01-07 18:24

Message:
Logged In: YES 
user_id=562624
Originator: NO

You may have encountered this on sockets but *all* Python I/O does not
handle restart on EINTR. 

The right place to fix this is probably in C, not the Python library. The
places where an I/O operation could be interrupted are practically anywhere
the GIL is released. This kind of change is likely to be controversial.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 21:36:07 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 12:36:07 -0800
Subject: [Patches] [ python-Patches-1630118 ] Patch to add
	tempfile.SpooledTemporaryFile (for #415692)
Message-ID: <E1H3ekR-0003vL-HT@sc8-sf-monitor2.sourceforge.net>

Patches item #1630118, was opened at 2007-01-07 14:36
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dustin J. Mitchell (djmitche)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692)

Initial Comment:
Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 21:37:22 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 12:37:22 -0800
Subject: [Patches] [ python-Patches-1630118 ] Patch to add
	tempfile.SpooledTemporaryFile (for #415692)
Message-ID: <E1H3ele-00060I-6r@sc8-sf-web3.sourceforge.net>

Patches item #1630118, was opened at 2007-01-07 14:36
Message generated for change (Comment added) made by djmitche
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dustin J. Mitchell (djmitche)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692)

Initial Comment:
Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases.

----------------------------------------------------------------------

>Comment By: Dustin J. Mitchell (djmitche)
Date: 2007-01-07 14:37

Message:
Logged In: YES 
user_id=7446
Originator: YES

File Added: SpooledTemporaryFile.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 22:34:32 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 13:34:32 -0800
Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining
	on BINARY_SUBSCR
Message-ID: <E1H3fey-0002au-CW@sc8-sf-web4.sourceforge.net>

Patches item #1629718, was opened at 2007-01-07 06:51
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hirokazu Yamamoto (ocean-city)
Assigned to: Nobody/Anonymous (nobody)
Summary: fast tuple[index] by inlining on BINARY_SUBSCR

Initial Comment:
Hello.

I noticed there is speed difference between

  a = [0,] # list
  a[0] # fast

and

  a = (0,) # tuple
  a[0] # slow

while solving ICPC puzzle with Python.

I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them.

After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR).

Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you.


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-07 22:34

Message:
Logged In: YES 
user_id=21627
Originator: NO

It would be helpful to get some statistics on how often this occurs: of
all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and
how many to other objects?

To get some data, you can measure a run of a test suite, or a run of IDLE,
or of compileall.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

From noreply at sourceforge.net  Sun Jan  7 23:03:49 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 14:03:49 -0800
Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for
	MINGW
Message-ID: <E1H3g7J-0007jr-Px@sc8-sf-web11.sourceforge.net>

Patches item #1597850, was opened at 2006-11-16 16:57
Message generated for change (Comment added) made by rmt38
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Han-Wen Nienhuys (hanwen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Cross compiling patches for MINGW

Initial Comment:
Hello, 

attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python.  The patches were originally written by Jan Nieuwenhuizen, my codeveloper.

These patches have been tested with Linux/x86, linux/x64 and macos 10.3  as build host and   linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at  lilypond.org/install/ except for darwin contain the x-compiled python.

Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme.

It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion.

Basically, what we do is override autoconf and Makefile settings through setting enviroment variables.

**README section** 


Cross Compiling
---------------

Python can be cross compiled by supplying different --build and --host
parameters to configure.  Python is compiled on the "build" system and
executed on the "host" system.  Cross compiling python requires a
native Python on the build host, and a natively compiled tool `Pgen'.

Before cross compiling, Python must first be compiled and installed on
the build host.  The configure script will use `cc' and `python', or
environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg:

   CC_FOR_BUILD=gcc-3.3 \
   PYTHON_FOR_BUILD=python2.4 \
   .../configure --build=i686-linux --host=i586-mingw32

Cross compiling has been tested under linux, mileage may vary for
other platforms.

A few reminders on using configure to cross compile:
- Cross compile tools must be in PATH,
- Cross compile tools must be prefixed with the host type
  (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...),
- CC, CXX, AR, and RANLIB must be undefined when running configure,
  they will be auto-detected.

If you need a cross compiler, Debian ships several several (eg: avr,
m68hc1x, mingw32), while dpkg-cross easily creates others.  Otherwise,
check out Dan Kegel's crosstool: http://www.kegel.com/crosstool .


----------------------------------------------------------------------

Comment By: Richard Tew (rmt38)
Date: 2007-01-07 22:03

Message:
Logged In: YES 
user_id=1417949
Originator: NO

config.cache is not generated or used on my Windows installation of MinGW
unless --config-cache is also given as argument to configure, and from the
autoconf documentation this seems to be the default behaviour.  So you
might want to amend the instructions to take that into account.

Isn't requiring the user to manually create and edit config.cache
resulting in unnecessary work and confusion for the them when it can be
addressed in configure.in?  Given that checking files is an operation which
does not work when cross_compiling is set and checking them results in
configure exiting because of this, configure.in can check cross_compiling
before trying these checks and avoid them allowing configure to complete.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-07 02:37

Message:
Logged In: YES 
user_id=161998
Originator: YES

"checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling"

You need to set up a config.cache file that contains 
the correct entry for 
  
  ac_cv_file__dev_ptmx 


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-07 02:37

Message:
Logged In: YES 
user_id=161998
Originator: YES

"checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling"

You need to set up a config.cache file that contains 
the correct entry for 
  
  ac_cv_file__dev_ptmx 


----------------------------------------------------------------------

Comment By: Richard Tew (rmt38)
Date: 2007-01-07 01:50

Message:
Logged In: YES 
user_id=1417949
Originator: NO

This:

AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have
/dev/ptmx.]))

Is being translated into:

echo "$as_me:$LINENO: checking for /dev/ptmx" >&5
echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6
if test "${ac_cv_file__dev_ptmx+set}" = set; then
  echo $ECHO_N "(cached) $ECHO_C" >&6
else
  test "$cross_compiling" = yes &&
  { { echo "$as_me:$LINENO: error: cannot check for file existence when
cross compiling" >&5
echo "$as_me: error: cannot check for file existence when cross compiling"
>&2;}
   { (exit 1); exit 1; }; }
if test -r "/dev/ptmx"; then
  ac_cv_file__dev_ptmx=yes
else
  ac_cv_file__dev_ptmx=no
fi
fi

Which exits when I do:

$ export CC_FOR_BUILD=gcc
$ sh configure --host=arm-eabi

With an error like:

checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling

I am using the latest version of msys/mingw with devkitarm to cross
compile.  Is this supposed to happen?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:50

Message:
Logged In: YES 
user_id=161998
Originator: YES

this is a patch against a SVN checkout of last week.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:48

Message:
Logged In: YES 
user_id=161998
Originator: YES

With cross.patch I've been able to build a working freebsd
python on linux.

Since you had little problems with the X-compile patches, I'm
resubmitting those first.  I'd like to give our (admittedly: oddball)
mingw version another go when the X-compile patches are in python SVN.

Regarding your comments:

* what would be a better to import the SO setting?

the most reliable way to get something out of a makefile into python is

  VAR=foo
  export VAR
  .. 
  os.environ['VAR']

this doesn't introduce any fragility in parsing/expanding/(un)quoting, so
it's
actually pretty good.

Right now, I'm overriding sysconfig wholesale in setup.py with a

  sysconfig._config_vars.update (os.environ)

but I'm not sure that this affects the settings in build_ext.py.
A freebsd -> linux compile does not touch that code, so
if you dislike it, we can leave it out.

* I've documented the .x  extension

File Added: cross.patch

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:12

Message:
Logged In: YES 
user_id=21627
Originator: NO

One more note: it would be best if the patches were against the subversion
trunk. They won't be included in the 2.5 maintenance branch (as they are a
new feature), so they need to be ported to the trunk, anyway.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:06

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'll add my comments as I go through the patches.

cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc:
- why is there now a mingw32msvc2 platform? If the target is mingw (rather
than Cygwin), I'd expect that the target is just Win32/Windows, and that
all symbolic constants provided be usable across all Win32 Pythons.
- why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a
target header file?
- please include any plat-* files that you generate in the patch.
- why do you need dl_nt.c in Modules? Please make it use the one from PC
(consider updating the comment about
calling initall)

b52dbbbbc3adece61496b161d8c22599caae2311
- please combine all patches adding support for __MINGW32__ into a single
one. Why is anything needed here at all? I thought Python compiles already
with mingw32 (on Windows)?
- what is the exclusion of freezing for?

059af829d362b10bb5921367c93a56dbb51ef31b
- Why are you taking timeval from winsock2.h? It should come from
sys/time.h, and does in my copy of Debian mingw32-runtime.

6a742fb15b28564f9a1bc916c76a28dc672a9b2c
- Why are these changes needed? It's Windows, and that is already
supported.

a838b4780998ef98ae4880c3916274d45b661c82
- Why doesn't that already work on Windows+cygwin+mingw32?

f452fe4b95085d8c1ba838bf302a6a48df3c1d31
- I think this should target msvcr71.dll, not msvcrt.dll

Please also combine the cross-compilation patches into a single one.
- there is no need to provide pyconfig.h.in changes; I'll regenerate that,
anyway.


9c022e407c366c9f175e9168542ccc76eae9e3f0
- please integrate those into the large AC_CHECK_FUNCS that already
exists


540684d696df6057ee2c9c4e13e33fe450605ffa
- Why are you stripping -Wl?

64f5018e975419b2d37c39f457c8732def3288df
- Try getting SO from the Makefile, not from the environment (I assume
this is also meant to support true distutils packages some day).

7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c
- again: what is the "mingw" platform?

7d3a45788a0d83608d10e5c0a34f08b426d62e92
- is this really necessary? I suggest to drop it

23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea
- don't include bits/time.h; it's not meant for direct inclusion

6689ca9dea07afbe8a77b7787a5c4e1642f803a1
- what's a .x file?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-25 15:12

Message:
Logged In: YES 
user_id=161998
Originator: YES

I've sent the agreement by snailmail.


----------------------------------------------------------------------

Comment By: Jan Nieuwenhuizen (janneke-sf)
Date: 2006-11-17 19:57

Message:
Logged In: YES 
user_id=1368960
Originator: NO

I do not mind either.  I've just signed and faxed contrib-form.html.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:33

Message:
Logged In: YES 
user_id=161998
Originator: YES

note that not all of the patch needs to go in its current form. In
particular, setup.py should be
much more clever to look into build-root for finding libs and include
files.
 

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:32

Message:
Logged In: YES 
user_id=161998
Originator: YES

I don't mind, and I expect Jan won't have a problem either.

What's the procedure: do we send the disclaimer first, or do you do the
review, or does everything happen in parallel?

 
----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-11-16 21:47

Message:
Logged In: YES 
user_id=21627
Originator: NO

Would you and Jan Nieuwenhuizen be willing to sign the contributor
agreement, at

http://www.python.org/psf/contrib.html

I haven't reviewed the patch yet; if they can be integrated, that will
only happen in the trunk (i.e. not for 2.5.x).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 04:02:03 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 19:02:03 -0800
Subject: [Patches] [ python-Patches-1630248 ] Implement named exception
	cleanup
Message-ID: <E1H3klv-0005zF-LW@sc8-sf-web1.sourceforge.net>

Patches item #1630248, was opened at 2007-01-07 22:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Implement named exception cleanup

Initial Comment:
This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

Specifically,

try:
  ...
except ExcType, e:
  #body

is translated to

try:
  ...
except ExcType, e:
  try:
    # body
  finally:
    e = None
    del e


The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 04:02:24 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 19:02:24 -0800
Subject: [Patches] [ python-Patches-1630248 ] Implement named exception
	cleanup
Message-ID: <E1H3kmG-00063V-9e@sc8-sf-web1.sourceforge.net>

Patches item #1630248, was opened at 2007-01-07 22:02
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Implement named exception cleanup

Initial Comment:
This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

Specifically,

try:
  ...
except ExcType, e:
  #body

is translated to

try:
  ...
except ExcType, e:
  try:
    # body
  finally:
    e = None
    del e


The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:02

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 04:34:35 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 19:34:35 -0800
Subject: [Patches] [ python-Patches-1630248 ] Implement named exception
	cleanup
Message-ID: <E1H3lHP-0005iD-6y@sc8-sf-web8.sourceforge.net>

Patches item #1630248, was opened at 2007-01-07 22:02
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Implement named exception cleanup

Initial Comment:
This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

Specifically,

try:
  ...
except ExcType, e:
  #body

is translated to

try:
  ...
except ExcType, e:
  try:
    # body
  finally:
    e = None
    del e


The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:34

Message:
Logged In: YES 
user_id=1344176
Originator: YES

This is the first time I've done this kind of surgery on the compiler, so
any tips/tricks/advice would be greatly appreciated.

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:02

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 04:56:53 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 19:56:53 -0800
Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining
	on BINARY_SUBSCR
Message-ID: <E1H3lcz-0005VT-N6@sc8-sf-web6.sourceforge.net>

Patches item #1629718, was opened at 2007-01-07 00:51
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hirokazu Yamamoto (ocean-city)
Assigned to: Nobody/Anonymous (nobody)
Summary: fast tuple[index] by inlining on BINARY_SUBSCR

Initial Comment:
Hello.

I noticed there is speed difference between

  a = [0,] # list
  a[0] # fast

and

  a = (0,) # tuple
  a[0] # slow

while solving ICPC puzzle with Python.

I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them.

After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR).

Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you.


----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2007-01-07 22:56

Message:
Logged In: YES 
user_id=80475
Originator: NO

I recommend against this.

Any additional specialization code will necessarily slow down other cases
handled by PyObject_GetItem.  So, the merits of speeding-up tuple indexing
need to be weighed against the costs (slowing down other code and the
excess loading of ceval.c with specialization code).

Also, I reject the premise that there is no conceptual difference between
list and tuple indexing.  The former is a primary use case for lists and
the latter is only incidental to tuple use cases (see the endless
discussions on python-dev and comp.lang.python about why tuples are not to
be regarded as immutable lists and in fact have a different intended set of
uses).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-07 16:34

Message:
Logged In: YES 
user_id=21627
Originator: NO

It would be helpful to get some statistics on how often this occurs: of
all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and
how many to other objects?

To get some data, you can measure a run of a test suite, or a run of IDLE,
or of compileall.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 06:49:51 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 21:49:51 -0800
Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining
	on BINARY_SUBSCR
Message-ID: <E1H3nOJ-0005Kl-8U@sc8-sf-web1.sourceforge.net>

Patches item #1629718, was opened at 2007-01-07 14:51
Message generated for change (Comment added) made by ocean-city
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hirokazu Yamamoto (ocean-city)
Assigned to: Nobody/Anonymous (nobody)
Summary: fast tuple[index] by inlining on BINARY_SUBSCR

Initial Comment:
Hello.

I noticed there is speed difference between

  a = [0,] # list
  a[0] # fast

and

  a = (0,) # tuple
  a[0] # slow

while solving ICPC puzzle with Python.

I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them.

After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR).

Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you.


----------------------------------------------------------------------

>Comment By: Hirokazu Yamamoto (ocean-city)
Date: 2007-01-08 14:49

Message:
Logged In: YES 
user_id=1200846
Originator: YES

Sorry, I want to withdraw this.

Python/lib/test/testall.py
===> list: 2541719, tuple: 620815, other: 6174214

The ratio of tuple seems relatively low.
File Added: statistics.patch

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-01-08 12:56

Message:
Logged In: YES 
user_id=80475
Originator: NO

I recommend against this.

Any additional specialization code will necessarily slow down other cases
handled by PyObject_GetItem.  So, the merits of speeding-up tuple indexing
need to be weighed against the costs (slowing down other code and the
excess loading of ceval.c with specialization code).

Also, I reject the premise that there is no conceptual difference between
list and tuple indexing.  The former is a primary use case for lists and
the latter is only incidental to tuple use cases (see the endless
discussions on python-dev and comp.lang.python about why tuples are not to
be regarded as immutable lists and in fact have a different intended set of
uses).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-08 06:34

Message:
Logged In: YES 
user_id=21627
Originator: NO

It would be helpful to get some statistics on how often this occurs: of
all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and
how many to other objects?

To get some data, you can measure a run of a test suite, or a run of IDLE,
or of compileall.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 07:58:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 22:58:11 -0800
Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining
	on BINARY_SUBSCR
Message-ID: <E1H3oSR-0000aO-Eo@sc8-sf-web5.sourceforge.net>

Patches item #1629718, was opened at 2007-01-07 14:51
Message generated for change (Comment added) made by ocean-city
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Hirokazu Yamamoto (ocean-city)
Assigned to: Nobody/Anonymous (nobody)
Summary: fast tuple[index] by inlining on BINARY_SUBSCR

Initial Comment:
Hello.

I noticed there is speed difference between

  a = [0,] # list
  a[0] # fast

and

  a = (0,) # tuple
  a[0] # slow

while solving ICPC puzzle with Python.

I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them.

After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR).

Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you.


----------------------------------------------------------------------

>Comment By: Hirokazu Yamamoto (ocean-city)
Date: 2007-01-08 15:58

Message:
Logged In: YES 
user_id=1200846
Originator: YES

>see the endless discussions on python-dev...

Thank you, rhettinger. I'm interested in it. I'll see them.

----------------------------------------------------------------------

Comment By: Hirokazu Yamamoto (ocean-city)
Date: 2007-01-08 14:49

Message:
Logged In: YES 
user_id=1200846
Originator: YES

Sorry, I want to withdraw this.

Python/lib/test/testall.py
===> list: 2541719, tuple: 620815, other: 6174214

The ratio of tuple seems relatively low.
File Added: statistics.patch

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-01-08 12:56

Message:
Logged In: YES 
user_id=80475
Originator: NO

I recommend against this.

Any additional specialization code will necessarily slow down other cases
handled by PyObject_GetItem.  So, the merits of speeding-up tuple indexing
need to be weighed against the costs (slowing down other code and the
excess loading of ceval.c with specialization code).

Also, I reject the premise that there is no conceptual difference between
list and tuple indexing.  The former is a primary use case for lists and
the latter is only incidental to tuple use cases (see the endless
discussions on python-dev and comp.lang.python about why tuples are not to
be regarded as immutable lists and in fact have a different intended set of
uses).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-08 06:34

Message:
Logged In: YES 
user_id=21627
Originator: NO

It would be helpful to get some statistics on how often this occurs: of
all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and
how many to other objects?

To get some data, you can measure a run of a test suite, or a run of IDLE,
or of compileall.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 08:19:33 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 23:19:33 -0800
Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining
	on BINARY_SUBSCR
Message-ID: <E1H3on7-0005ZC-4A@sc8-sf-web1.sourceforge.net>

Patches item #1629718, was opened at 2007-01-07 06:51
Message generated for change (Settings changed) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Hirokazu Yamamoto (ocean-city)
Assigned to: Nobody/Anonymous (nobody)
Summary: fast tuple[index] by inlining on BINARY_SUBSCR

Initial Comment:
Hello.

I noticed there is speed difference between

  a = [0,] # list
  a[0] # fast

and

  a = (0,) # tuple
  a[0] # slow

while solving ICPC puzzle with Python.

I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them.

After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR).

Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you.


----------------------------------------------------------------------

Comment By: Hirokazu Yamamoto (ocean-city)
Date: 2007-01-08 07:58

Message:
Logged In: YES 
user_id=1200846
Originator: YES

>see the endless discussions on python-dev...

Thank you, rhettinger. I'm interested in it. I'll see them.

----------------------------------------------------------------------

Comment By: Hirokazu Yamamoto (ocean-city)
Date: 2007-01-08 06:49

Message:
Logged In: YES 
user_id=1200846
Originator: YES

Sorry, I want to withdraw this.

Python/lib/test/testall.py
===> list: 2541719, tuple: 620815, other: 6174214

The ratio of tuple seems relatively low.
File Added: statistics.patch

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-01-08 04:56

Message:
Logged In: YES 
user_id=80475
Originator: NO

I recommend against this.

Any additional specialization code will necessarily slow down other cases
handled by PyObject_GetItem.  So, the merits of speeding-up tuple indexing
need to be weighed against the costs (slowing down other code and the
excess loading of ceval.c with specialization code).

Also, I reject the premise that there is no conceptual difference between
list and tuple indexing.  The former is a primary use case for lists and
the latter is only incidental to tuple use cases (see the endless
discussions on python-dev and comp.lang.python about why tuples are not to
be regarded as immutable lists and in fact have a different intended set of
uses).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-07 22:34

Message:
Logged In: YES 
user_id=21627
Originator: NO

It would be helpful to get some statistics on how often this occurs: of
all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and
how many to other objects?

To get some data, you can measure a run of a test suite, or a run of IDLE,
or of compileall.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 08:45:06 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 23:45:06 -0800
Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map
Message-ID: <E1H3pBq-0004Vv-F7@sc8-sf-web8.sourceforge.net>

Patches item #1616979, was opened at 2006-12-16 15:24
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Belchenko (bialix)
Assigned to: Nobody/Anonymous (nobody)
Summary: cp720 encoding map

Initial Comment:
I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage.

Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-08 08:45

Message:
Logged In: YES 
user_id=21627
Originator: NO

Where did you get CP720.txt from? Just generating the file is not good
enough: it must be integrated somehow into Tools/unicode.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 08:59:08 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 07 Jan 2007 23:59:08 -0800
Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for
	MINGW
Message-ID: <E1H3pPQ-00039w-RM@sc8-sf-monitor2.sourceforge.net>

Patches item #1597850, was opened at 2006-11-16 16:57
Message generated for change (Comment added) made by hanwen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Build
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Han-Wen Nienhuys (hanwen)
Assigned to: Nobody/Anonymous (nobody)
Summary: Cross compiling patches for MINGW

Initial Comment:
Hello, 

attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python.  The patches were originally written by Jan Nieuwenhuizen, my codeveloper.

These patches have been tested with Linux/x86, linux/x64 and macos 10.3  as build host and   linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at  lilypond.org/install/ except for darwin contain the x-compiled python.

Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme.

It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion.

Basically, what we do is override autoconf and Makefile settings through setting enviroment variables.

**README section** 


Cross Compiling
---------------

Python can be cross compiled by supplying different --build and --host
parameters to configure.  Python is compiled on the "build" system and
executed on the "host" system.  Cross compiling python requires a
native Python on the build host, and a natively compiled tool `Pgen'.

Before cross compiling, Python must first be compiled and installed on
the build host.  The configure script will use `cc' and `python', or
environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg:

   CC_FOR_BUILD=gcc-3.3 \
   PYTHON_FOR_BUILD=python2.4 \
   .../configure --build=i686-linux --host=i586-mingw32

Cross compiling has been tested under linux, mileage may vary for
other platforms.

A few reminders on using configure to cross compile:
- Cross compile tools must be in PATH,
- Cross compile tools must be prefixed with the host type
  (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...),
- CC, CXX, AR, and RANLIB must be undefined when running configure,
  they will be auto-detected.

If you need a cross compiler, Debian ships several several (eg: avr,
m68hc1x, mingw32), while dpkg-cross easily creates others.  Otherwise,
check out Dan Kegel's crosstool: http://www.kegel.com/crosstool .


----------------------------------------------------------------------

>Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-08 07:59

Message:
Logged In: YES 
user_id=161998
Originator: YES

Regarding --config-cache, yes you're correct. 

Regarding extending configure.in, it does already say
  
"configure: error: cannot check for file
existence when cross compiling"

and exit.

What more would you like it to do?  I could add a check
that the --config-cache is given, although that is not strictly 
necessary (You can also set the variables in the environment.)


----------------------------------------------------------------------

Comment By: Richard Tew (rmt38)
Date: 2007-01-07 22:03

Message:
Logged In: YES 
user_id=1417949
Originator: NO

config.cache is not generated or used on my Windows installation of MinGW
unless --config-cache is also given as argument to configure, and from the
autoconf documentation this seems to be the default behaviour.  So you
might want to amend the instructions to take that into account.

Isn't requiring the user to manually create and edit config.cache
resulting in unnecessary work and confusion for the them when it can be
addressed in configure.in?  Given that checking files is an operation
which does not work when cross_compiling is set and checking them results
in configure exiting because of this, configure.in can check
cross_compiling before trying these checks and avoid them allowing
configure to complete.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-07 02:37

Message:
Logged In: YES 
user_id=161998
Originator: YES

"checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling"

You need to set up a config.cache file that contains 
the correct entry for 
  
  ac_cv_file__dev_ptmx 


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2007-01-07 02:37

Message:
Logged In: YES 
user_id=161998
Originator: YES

"checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling"

You need to set up a config.cache file that contains 
the correct entry for 
  
  ac_cv_file__dev_ptmx 


----------------------------------------------------------------------

Comment By: Richard Tew (rmt38)
Date: 2007-01-07 01:50

Message:
Logged In: YES 
user_id=1417949
Originator: NO

This:

AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have
/dev/ptmx.]))

Is being translated into:

echo "$as_me:$LINENO: checking for /dev/ptmx" >&5
echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6
if test "${ac_cv_file__dev_ptmx+set}" = set; then
  echo $ECHO_N "(cached) $ECHO_C" >&6
else
  test "$cross_compiling" = yes &&
  { { echo "$as_me:$LINENO: error: cannot check for file existence when
cross compiling" >&5
echo "$as_me: error: cannot check for file existence when cross compiling"
>&2;}
   { (exit 1); exit 1; }; }
if test -r "/dev/ptmx"; then
  ac_cv_file__dev_ptmx=yes
else
  ac_cv_file__dev_ptmx=no
fi
fi

Which exits when I do:

$ export CC_FOR_BUILD=gcc
$ sh configure --host=arm-eabi

With an error like:

checking for /dev/ptmx... configure: error: cannot check for file
existence when cross compiling

I am using the latest version of msys/mingw with devkitarm to cross
compile.  Is this supposed to happen?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:50

Message:
Logged In: YES 
user_id=161998
Originator: YES

this is a patch against a SVN checkout of last week.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-09 23:48

Message:
Logged In: YES 
user_id=161998
Originator: YES

With cross.patch I've been able to build a working freebsd
python on linux.

Since you had little problems with the X-compile patches, I'm
resubmitting those first.  I'd like to give our (admittedly: oddball)
mingw version another go when the X-compile patches are in python SVN.

Regarding your comments:

* what would be a better to import the SO setting?

the most reliable way to get something out of a makefile into python is

  VAR=foo
  export VAR
  .. 
  os.environ['VAR']

this doesn't introduce any fragility in parsing/expanding/(un)quoting, so
it's
actually pretty good.

Right now, I'm overriding sysconfig wholesale in setup.py with a

  sysconfig._config_vars.update (os.environ)

but I'm not sure that this affects the settings in build_ext.py.
A freebsd -> linux compile does not touch that code, so
if you dislike it, we can leave it out.

* I've documented the .x  extension

File Added: cross.patch

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:12

Message:
Logged In: YES 
user_id=21627
Originator: NO

One more note: it would be best if the patches were against the subversion
trunk. They won't be included in the 2.5 maintenance branch (as they are a
new feature), so they need to be ported to the trunk, anyway.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-06 20:06

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'll add my comments as I go through the patches.

cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc:
- why is there now a mingw32msvc2 platform? If the target is mingw (rather
than Cygwin), I'd expect that the target is just Win32/Windows, and that
all symbolic constants provided be usable across all Win32 Pythons.
- why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a
target header file?
- please include any plat-* files that you generate in the patch.
- why do you need dl_nt.c in Modules? Please make it use the one from PC
(consider updating the comment about
calling initall)

b52dbbbbc3adece61496b161d8c22599caae2311
- please combine all patches adding support for __MINGW32__ into a single
one. Why is anything needed here at all? I thought Python compiles already
with mingw32 (on Windows)?
- what is the exclusion of freezing for?

059af829d362b10bb5921367c93a56dbb51ef31b
- Why are you taking timeval from winsock2.h? It should come from
sys/time.h, and does in my copy of Debian mingw32-runtime.

6a742fb15b28564f9a1bc916c76a28dc672a9b2c
- Why are these changes needed? It's Windows, and that is already
supported.

a838b4780998ef98ae4880c3916274d45b661c82
- Why doesn't that already work on Windows+cygwin+mingw32?

f452fe4b95085d8c1ba838bf302a6a48df3c1d31
- I think this should target msvcr71.dll, not msvcrt.dll

Please also combine the cross-compilation patches into a single one.
- there is no need to provide pyconfig.h.in changes; I'll regenerate that,
anyway.


9c022e407c366c9f175e9168542ccc76eae9e3f0
- please integrate those into the large AC_CHECK_FUNCS that already
exists


540684d696df6057ee2c9c4e13e33fe450605ffa
- Why are you stripping -Wl?

64f5018e975419b2d37c39f457c8732def3288df
- Try getting SO from the Makefile, not from the environment (I assume
this is also meant to support true distutils packages some day).

7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c
- again: what is the "mingw" platform?

7d3a45788a0d83608d10e5c0a34f08b426d62e92
- is this really necessary? I suggest to drop it

23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea
- don't include bits/time.h; it's not meant for direct inclusion

6689ca9dea07afbe8a77b7787a5c4e1642f803a1
- what's a .x file?


----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-25 15:12

Message:
Logged In: YES 
user_id=161998
Originator: YES

I've sent the agreement by snailmail.


----------------------------------------------------------------------

Comment By: Jan Nieuwenhuizen (janneke-sf)
Date: 2006-11-17 19:57

Message:
Logged In: YES 
user_id=1368960
Originator: NO

I do not mind either.  I've just signed and faxed contrib-form.html.

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:33

Message:
Logged In: YES 
user_id=161998
Originator: YES

note that not all of the patch needs to go in its current form. In
particular, setup.py should be
much more clever to look into build-root for finding libs and include
files.
 

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-11-17 00:32

Message:
Logged In: YES 
user_id=161998
Originator: YES

I don't mind, and I expect Jan won't have a problem either.

What's the procedure: do we send the disclaimer first, or do you do the
review, or does everything happen in parallel?

 
----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-11-16 21:47

Message:
Logged In: YES 
user_id=21627
Originator: NO

Would you and Jan Nieuwenhuizen be willing to sign the contributor
agreement, at

http://www.python.org/psf/contrib.html

I haven't reviewed the patch yet; if they can be integrated, that will
only happen in the trunk (i.e. not for 2.5.x).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 09:26:44 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 00:26:44 -0800
Subject: [Patches] [ python-Patches-1630118 ] Patch to add
	tempfile.SpooledTemporaryFile (for #415692)
Message-ID: <E1H3pq8-000703-1r@sc8-sf-web3.sourceforge.net>

Patches item #1630118, was opened at 2007-01-07 20:36
Message generated for change (Comment added) made by arigo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dustin J. Mitchell (djmitche)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692)

Initial Comment:
Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases.

----------------------------------------------------------------------

>Comment By: Armin Rigo (arigo)
Date: 2007-01-08 08:26

Message:
Logged In: YES 
user_id=4771
Originator: NO

The __getattr__ magic makes the following kind of code fail with
SpooledTemporaryFile:

  f = SpooledTemporaryFile(max_size=something)
  rd = f.read
  wr = f.write
  for x in y:
      ...use rd(size) and wr(data)...

The problem is that the captured 'f.read' method is the one from the
StringIO instance, even after the write() rolled the file over to disk. 
Given that capturing bound methods is a semi-official speed hack advertised
in some respected places, we might have to be careful about it.  About such
matters I am biased towards first getting it right and then getting it
fast...

Also, Python 2.5 is already out, so this will probably be a 2.6 addition.

----------------------------------------------------------------------

Comment By: Dustin J. Mitchell (djmitche)
Date: 2007-01-07 20:37

Message:
Logged In: YES 
user_id=7446
Originator: YES

File Added: SpooledTemporaryFile.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 11:26:39 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 02:26:39 -0800
Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map
Message-ID: <E1H3riB-0001G0-9p@sc8-sf-web2.sourceforge.net>

Patches item #1616979, was opened at 2006-12-16 15:24
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Belchenko (bialix)
Assigned to: Nobody/Anonymous (nobody)
Summary: cp720 encoding map

Initial Comment:
I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage.

Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 11:26

Message:
Logged In: YES 
user_id=38388
Originator: NO

Please provide a reference defining the encoding.

The only reference I could find was
http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx
but that doesn't provide the mapping table.

Thanks.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-08 08:45

Message:
Logged In: YES 
user_id=21627
Originator: NO

Where did you get CP720.txt from? Just generating the file is not good
enough: it must be integrated somehow into Tools/unicode.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 11:51:51 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 02:51:51 -0800
Subject: [Patches] [ python-Patches-1628205 ] socket.readline() interface
	doesn't handle EINTR properly
Message-ID: <E1H3s6Z-0003fq-6w@sc8-sf-web2.sourceforge.net>

Patches item #1628205, was opened at 2007-01-04 13:37
Message generated for change (Comment added) made by sobomax
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Maxim Sobolev (sobomax)
Assigned to: Nobody/Anonymous (nobody)
Summary: socket.readline() interface doesn't handle EINTR properly

Initial Comment:
The socket.readline() interface doesn't handle EINTR properly. Currently, when EINTR received exception is not handled and all data that has been in the buffer is lost. There is no way to recover that data from the code that uses the interface.

Correct behaviour would be to catch EINTR and restart recv(). Patch is attached.

Following is the real world example of how it affects httplib module:

  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1096, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1383, in __request
    verbose=self.__verbose
  File "/usr/local/lib/python2.4/xmlrpclib.py", line 1131, in request
    errcode, errmsg, headers = h.getreply()
  File "/usr/local/lib/python2.4/httplib.py", line 1137, in getreply
    response = self._conn.getresponse()
  File "/usr/local/lib/python2.4/httplib.py", line 866, in getresponse
    response.begin()
  File "/usr/local/lib/python2.4/httplib.py", line 336, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python2.4/httplib.py", line 294, in _read_status
    line = self.fp.readline()
  File "/usr/local/lib/python2.4/socket.py", line 325, in readline
    data = recv(1)
error: (4, 'Interrupted system call')

-Maxim

----------------------------------------------------------------------

>Comment By: Maxim Sobolev (sobomax)
Date: 2007-01-08 02:51

Message:
Logged In: YES 
user_id=24670
Originator: YES

Well, it's not quite correct since for example httplib.py tries to handle
EINTR. The fundamental problem with socket.readline() is that it does
internal buffering so that getting EINTR results in data being lost.

I don't think it has to be fixed in C, since recv() is very low-level
interface and it is expected to return EINTR on signal, so that "fixing" it
there could possibly break software that relies on this behaviour. And I
don't quite buy your reasoning - "since it's broken in few more places
let's keep it consistently broken everywhere". To me it sounds like attempt
to hide the head in the sand instead of facing the problem at hand. Fixing
socket.readline() may be the first step in improvind the library to handle
this condition properly.

----------------------------------------------------------------------

Comment By: Oren Tirosh (orenti)
Date: 2007-01-07 10:24

Message:
Logged In: YES 
user_id=562624
Originator: NO

You may have encountered this on sockets but *all* Python I/O does not
handle restart on EINTR. 

The right place to fix this is probably in C, not the Python library. The
places where an I/O operation could be interrupted are practically anywhere
the GIL is released. This kind of change is likely to be controversial.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 11:59:50 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 02:59:50 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H3sEI-0003tR-Fw@sc8-sf-monitor2.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 10:37
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 11:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 06:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 13:47:08 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 04:47:08 -0800
Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map
Message-ID: <E1H3tu8-0000sb-TH@sc8-sf-web9.sourceforge.net>

Patches item #1616979, was opened at 2006-12-16 16:24
Message generated for change (Comment added) made by bialix
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Belchenko (bialix)
Assigned to: Nobody/Anonymous (nobody)
Summary: cp720 encoding map

Initial Comment:
I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage.

Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation.

----------------------------------------------------------------------

>Comment By: Alexander Belchenko (bialix)
Date: 2007-01-08 14:47

Message:
Logged In: YES 
user_id=957594
Originator: YES

When I start working on cp720 I'm search in google for cp720. I found this
presentation with actual map of chars:
http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt

Then I try to search for CP720.txt file and I found this page:
http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html

I download archive from that page and use CP720.txt to generate cp720.py.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 12:26

Message:
Logged In: YES 
user_id=38388
Originator: NO

Please provide a reference defining the encoding.

The only reference I could find was
http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx
but that doesn't provide the mapping table.

Thanks.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-08 09:45

Message:
Logged In: YES 
user_id=21627
Originator: NO

Where did you get CP720.txt from? Just generating the file is not good
enough: it must be integrated somehow into Tools/unicode.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 14:33:18 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 05:33:18 -0800
Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map
Message-ID: <E1H3uco-0005NI-Gw@sc8-sf-web9.sourceforge.net>

Patches item #1616979, was opened at 2006-12-16 16:24
Message generated for change (Comment added) made by bialix
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Belchenko (bialix)
Assigned to: Nobody/Anonymous (nobody)
Summary: cp720 encoding map

Initial Comment:
I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage.

Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation.

----------------------------------------------------------------------

>Comment By: Alexander Belchenko (bialix)
Date: 2007-01-08 15:33

Message:
Logged In: YES 
user_id=957594
Originator: YES

File Added: CP720.TXT

----------------------------------------------------------------------

Comment By: Alexander Belchenko (bialix)
Date: 2007-01-08 14:47

Message:
Logged In: YES 
user_id=957594
Originator: YES

When I start working on cp720 I'm search in google for cp720. I found this
presentation with actual map of chars:
http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt

Then I try to search for CP720.txt file and I found this page:
http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html

I download archive from that page and use CP720.txt to generate cp720.py.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 12:26

Message:
Logged In: YES 
user_id=38388
Originator: NO

Please provide a reference defining the encoding.

The only reference I could find was
http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx
but that doesn't provide the mapping table.

Thanks.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-08 09:45

Message:
Logged In: YES 
user_id=21627
Originator: NO

Where did you get CP720.txt from? Just generating the file is not good
enough: it must be integrated somehow into Tools/unicode.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 14:47:10 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 05:47:10 -0800
Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map
Message-ID: <E1H3uqE-0003aC-AP@sc8-sf-web6.sourceforge.net>

Patches item #1616979, was opened at 2006-12-16 16:24
Message generated for change (Comment added) made by bialix
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexander Belchenko (bialix)
Assigned to: Nobody/Anonymous (nobody)
Summary: cp720 encoding map

Initial Comment:
I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage.

Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation.

----------------------------------------------------------------------

>Comment By: Alexander Belchenko (bialix)
Date: 2007-01-08 15:47

Message:
Logged In: YES 
user_id=957594
Originator: YES

Here is the map on the Microsoft site:
http://www.microsoft.com/globaldev/reference/oem/720.mspx

----------------------------------------------------------------------

Comment By: Alexander Belchenko (bialix)
Date: 2007-01-08 15:33

Message:
Logged In: YES 
user_id=957594
Originator: YES

File Added: CP720.TXT

----------------------------------------------------------------------

Comment By: Alexander Belchenko (bialix)
Date: 2007-01-08 14:47

Message:
Logged In: YES 
user_id=957594
Originator: YES

When I start working on cp720 I'm search in google for cp720. I found this
presentation with actual map of chars:
http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt

Then I try to search for CP720.txt file and I found this page:
http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html

I download archive from that page and use CP720.txt to generate cp720.py.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 12:26

Message:
Logged In: YES 
user_id=38388
Originator: NO

Please provide a reference defining the encoding.

The only reference I could find was
http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx
but that doesn't provide the mapping table.

Thanks.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-08 09:45

Message:
Logged In: YES 
user_id=21627
Originator: NO

Where did you get CP720.txt from? Just generating the file is not good
enough: it must be integrated somehow into Tools/unicode.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 16:53:26 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 07:53:26 -0800
Subject: [Patches] [ python-Patches-1630118 ] Patch to add
	tempfile.SpooledTemporaryFile (for #415692)
Message-ID: <E1H3woQ-0002ow-2U@sc8-sf-web7.sourceforge.net>

Patches item #1630118, was opened at 2007-01-07 14:36
Message generated for change (Comment added) made by djmitche
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dustin J. Mitchell (djmitche)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692)

Initial Comment:
Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases.

----------------------------------------------------------------------

>Comment By: Dustin J. Mitchell (djmitche)
Date: 2007-01-08 09:53

Message:
Logged In: YES 
user_id=7446
Originator: YES

I agree it would break in such a situation, but I'm not clear on which
direction your bias leads you (specifically, which do we get right --
don't use bound methods, or don't use the __getattr__ magic?).

I could fix this by defining "proxy" functions (and some properties) for
the whole file interface, rather than just the methods that potentially
trigger rollover.  That would lose a little efficiency, but mostly only in
reading (calling e.g., f.read() will always result in two function
applications; in the current model, after the first call it runs at
"native" speed).  It would also lose forward compatibility if the file
protocol changes, although I'm not sure how likely that is.

Would you like me to do that?

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2007-01-08 02:26

Message:
Logged In: YES 
user_id=4771
Originator: NO

The __getattr__ magic makes the following kind of code fail with
SpooledTemporaryFile:

  f = SpooledTemporaryFile(max_size=something)
  rd = f.read
  wr = f.write
  for x in y:
      ...use rd(size) and wr(data)...

The problem is that the captured 'f.read' method is the one from the
StringIO instance, even after the write() rolled the file over to disk. 
Given that capturing bound methods is a semi-official speed hack
advertised in some respected places, we might have to be careful about it.
 About such matters I am biased towards first getting it right and then
getting it fast...

Also, Python 2.5 is already out, so this will probably be a 2.6 addition.

----------------------------------------------------------------------

Comment By: Dustin J. Mitchell (djmitche)
Date: 2007-01-07 14:37

Message:
Logged In: YES 
user_id=7446
Originator: YES

File Added: SpooledTemporaryFile.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 17:49:42 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 08:49:42 -0800
Subject: [Patches] [ python-Patches-1630248 ] Implement named exception
	cleanup
Message-ID: <E1H3xgs-0007d3-2e@sc8-sf-web9.sourceforge.net>

Patches item #1630248, was opened at 2007-01-07 22:02
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Implement named exception cleanup

Initial Comment:
This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

Specifically,

try:
  ...
except ExcType, e:
  #body

is translated to

try:
  ...
except ExcType, e:
  try:
    # body
  finally:
    e = None
    del e


The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:49

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:34

Message:
Logged In: YES 
user_id=1344176
Originator: YES

This is the first time I've done this kind of surgery on the compiler, so
any tips/tricks/advice would be greatly appreciated.

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:02

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 17:50:00 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 08:50:00 -0800
Subject: [Patches] [ python-Patches-1630248 ] Implement named exception
	cleanup
Message-ID: <E1H3xhA-0004bG-SC@sc8-sf-web2.sourceforge.net>

Patches item #1630248, was opened at 2007-01-07 22:02
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Implement named exception cleanup

Initial Comment:
This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

Specifically,

try:
  ...
except ExcType, e:
  #body

is translated to

try:
  ...
except ExcType, e:
  try:
    # body
  finally:
    e = None
    del e


The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:50

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: exc_cleanup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:49

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:34

Message:
Logged In: YES 
user_id=1344176
Originator: YES

This is the first time I've done this kind of surgery on the compiler, so
any tips/tricks/advice would be greatly appreciated.

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:02

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 17:51:57 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 08:51:57 -0800
Subject: [Patches] [ python-Patches-1630248 ] Implement named exception
	cleanup
Message-ID: <E1H3xj3-0006q1-6I@sc8-sf-web8.sourceforge.net>

Patches item #1630248, was opened at 2007-01-07 22:02
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Implement named exception cleanup

Initial Comment:
This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

Specifically,

try:
  ...
except ExcType, e:
  #body

is translated to

try:
  ...
except ExcType, e:
  try:
    # body
  finally:
    e = None
    del e


The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:51

Message:
Logged In: YES 
user_id=1344176
Originator: YES

Patches updated in reponse to PJE's comment
(http://mail.python.org/pipermail/python-3000/2007-January/005430.html):

"""In the tuple or list case, there's no need to reset the variables,
because 
then the traceback won't be present any more; the exception object will 
have been discarded after unpacking."""

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:50

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: exc_cleanup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:49

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:34

Message:
Logged In: YES 
user_id=1344176
Originator: YES

This is the first time I've done this kind of surgery on the compiler, so
any tips/tricks/advice would be greatly appreciated.

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:02

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 19:50:02 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 10:50:02 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H3zZK-0007oQ-QI@sc8-sf-web2.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 21:44:57 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 12:44:57 -0800
Subject: [Patches] [ python-Patches-909005 ] asyncore fixes and improvements
Message-ID: <E1H41MX-0002Pj-RN@sc8-sf-web3.sourceforge.net>

Patches item #909005, was opened at 2004-03-03 16:07
Message generated for change (Comment added) made by klimkin
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Alexey Klimkin (klimkin)
Assigned to: A.M. Kuchling (akuchling)
Summary: asyncore fixes and improvements

Initial Comment:
Minor:
* 0/1 for boolean values replaced with False/True.
* (887279) Added handling of POLLPRI as POLLIN.
POLLERR, POLLHUP,
POLLNVAL are handled as exception event.
handle_expt_event gets recent
error from self.socket object and raises socket.error.
* Default readable()/writable() returns False.
* Added "map" parameter for file_dispatcher.
* file_wrapper: removed "return" in close(), recv/read
and send/write
swapped because of their nature.
* mac code for writable() removed. Manual for accept()
on mac is similar
to the one on linux.
* Repeating exception changed from "raise socket.error,
why" to raise.
* Added connected/accepting/addr reset on close().
Initialization of
variables moved to __init__.
* close_all() now calls close for dispatcher object,
EBADF treated
as already closed socket/file.
* Added channel id to "unhandled..." messages.

Bugs:
* Fixed bug (654766,889153): client never gets
connected, nor errored.
Connecting client gets writable event from select(),
however, some client may want always be non writable.
Such client may
never get connected. The fix adds _readable() - always
True for
accepting and always False for connecting socket; and
_writable() -
always False for accepting and always True for
connecting socket.
This implies, that listening dispatcher's readable()
and writable()
will never be called. ("man accept" and "man connect"
for non-blocking
sockets).
* Fixed bug: error handling after accept().
It's said, that accept can return EWOULDBLOCK even for
readable socket.
This mean, that even after handle_accept(),
dispatcher's accept() still
raise EWOULDBLOCK. New code does accept() itself and
stores accepted
socket in self.__pending_accept. If there was
socket.error, it's treated
as EWOULDBLOCK. dispatcher's accept returns
self.__pending_accept and
resets it to None.

Features:
* Added pending_read() and pending_write(). The
functions helps to use
dispatcher over non socket objects with buffering
capabilities. In original
dispatcher, if socket makes buffered read and some data
is in buffer, entering
asyncore.poll() doesn't finishes, since there is no
data in real file/socket.
This feature allow to use SSL socket, since the socket
reads data by 16k chunks.


----------------------------------------------------------------------

>Comment By: Alexey Klimkin (klimkin)
Date: 2007-01-08 23:44

Message:
Logged In: YES 
user_id=410460
Originator: YES

1) The patch was developed not during some academic research - but during
of coding true non-blocking client-server applications, capable to run both
on Linux and Windows. Original code had a lot of issues with everything:
some parts were not truly blocking, not every socket can be passed, issues
with high load, etc.
2) We have used medusa for ssl capability in our project. However, it's
impossible to get fully non-blocking functionality with original asyncore
and original medusa. So functionality was extended to support these
features as well. That is what idispatcher for.
3) In the end we have got pretty reliable code, which supports features I
described here and has tons of bug and issues fixed. Again, I didn't fix
bug for any academic purpose - every fix was driven by real issue we met
during development. I don't also think, that these fixes bond to our
project too tight - I believe I made them pretty general.
4) It's possible, that some parts can be made better for other
application. But if you follow the same path - developing truly
non-blocking client-server with medusa's ssl capabilities, - I think, you
will end with the same thing.
5) I don't insist on including the patch into the python tree as is. I
feel pretty well using modified asyncore in my private library. My
intention was to share my experience. Please use, if you need to.
6) The development I mention above was 2004 year. So the patch is not in
sync with this reality for 2 years already. Some issues it was solving can
be gone already. I also don't know, what is going on with SSL for python -
there seems to be new libraries as well. 

...so... just use it as you want... or as you don't want ;) ...

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 07:53

Message:
Logged In: YES 
user_id=341410
Originator: NO

In asynchat, the only stuff that should be accepted is the handle_read()
changes.  The deque removal should be ignored (we have deques since Python
2.4, which are *significantly* faster than lists in nontrivial
applications), the iasync_chat stuff, like the idispatcher stuff, seems
unnecessary.  And that's pretty much it for asynchat.

The proposed asynchttp module shouldn't go into the Python standard
library until it has lived on its own for a nontrival amount of time in the
Cheeseshop and is found to be as good as httplib, urllib, or urllib2.  Even
then, its inclusion should be questioned, as medusa (the http server based
on asyncore) has been around for a decade or more, is used many places, and
yet still isn't in the standard library.

The asyncoreTest.py needs a bit of work (I notice some incorrect names),
but could be used as an addition to the test suite (currently it seems as
though only asynchat is tested).

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 07:42

Message:
Logged In: YES 
user_id=341410
Originator: NO

Many of the changes in the source provided by klimkin in his most recent
revision from February 27, 2005 seek to solve certain problems in an
inconsistent or incorrect way.  Some of his changes (or variants thereof)
are worthwhile.  I'll start with my issues with his asyncore changes, then
describe what I think should be added from them.

For example, in his updated asyncore.py, the list of sockets is first
shuffled randomly, then sorted based on priority.  Assuming that one
ignored priorities for a moment, if there were more sockets than the max
sockets for the platform, then due to the limitations of randomness, there
would be no guarantees that all sockets would get polled.  Say, for
example, that one were using windows and were running close to the actual
select file handle limit (512 in Python 2.3) with 500 handles, you would
skip 436 of the sockets *this pass*.  In 10 passes, there would have been
100 sockets that were never polled.  In 20 passes, there would still be, on
average, 20 that were never polled.  So this "randomization" step is the
wrong thing to do, unless you actually make multiple select calls for each
poll() call.  But really, select is limited by 512, and I've run it with
500 without issue.

The priority based sorting has much of the same problems, but it is even
worse when you have nontrivial numbers of differing priorities, regardless
of randomization or not.

The max socket limit of 64 on Windows isn't correct.  It's been 512 since
at least Python 2.3 .  And all other platforms being 65536?  No.  I've had
some versions of linux die on me at 512, others at 4096, but all were dog
slow beyond 500 or so.  It's better to let the underlying system raise an
exception for the user when it fails and let them attempt to tune it,
rather than forcing a tuning that may not be correct.


The "pending read" stuff is also misdirected.  Assuming a non-broken async
client or server, either should be handling content as it comes it,
dispatching as necessary.  See asynchat.collect_incoming_data() and
asynchat.found_terminator() for examples.

The idispatcher stuff seems unnecessary.


Generally speaking, it seems to me that there are 3 levels of abstraction
going on:
1) handle_*_event(), called by poll, poll2, etc.
2) handle_*(), called by handle_*_event(), user overrides, calls other
handle_*() and *() methods
3) *() (aka recv, send, close, etc.), called by handle_*(), generally left
alone.

Some of your code breaks the abstraction and has items in layer 2 call
items in layer 1, which then call items in layer 2 again.  This seems
unnecessary, and breaks the general downward calling semantic (except in
the case of errors returned by layer 3 resulting in layer 2 handle_close()
calls, which is the proper method to call).


There are, according to my reading of the asyncore portions of your
included module, a few things that may be worthy for inclusion into the
Python standard library are:

* A variant of your changes to close_all(), though it should proceed in
closing everything unless a KeyboardInterrupt, SystemExit, or ExitNow
exception is raised.  Socket errors should be ignored, because we are
closing them - we don't care about their error condition.

* Checking sockets for socket error via socket.getsockopt() .

* A variant of your .close() implementation.

* The CONNRESET, etc., stuff in the send() and recv() methods, but not the
handle_close_event() replacements, stick with handle_close() .

* Checking for KeyboardInterrupt and SystemExit inside the poll
functions.

* The _closed_socket class and initialization.

All but the last of the above, I would consider to be bugfixes, and if
others agree that these are reasonable changes, I'll write up a patch
against trunk and 2.5 maintenance.  The last change, while I think would be
nice, probably shouldn't be included in 2.5 maintenance, though I think
would be fine for the trunk.

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2005-02-27 00:39

Message:
Logged In: YES 
user_id=410460

Minor improvements:

    * Added handle_close_event(): calls handle_close(), then 
closes channel. No need to write self.close() in each handle_close
().

    * Improved exception handling. KeyboardInterrupt is not 
blocked. For python exception handle_error_event() is called, 
which checks for KeyboardInterrupt and closes socket, if 
handle_error didn't.

Bugs:

    * Calling connect() could raise exception and doesn't hit 
handle_error(). Now if there was an exception, 
handle_error_event() is called.

Features:

    * set_timeout(): Sets timeout for dispatcher object, if there was 
no io for the object, raises ETIMEDOUT, which handled by 
handle_error_event().

    * Fixed issue with Windows - too many descriptors in select(). 
The list of sockets shuffled and only first asyncore.max_channels 
used in select().

    * Added set_prio(): Sets priority for dispatcher.  After shuffle 
the list of sockets sorted by priority.


You may also check asynhttplib - asynchronous version of httplib.


----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-07-02 17:44

Message:
Logged In: YES 
user_id=410460

In addition to "[ 909005 ] asyncore fixes and improvements"
and CVS
version "asyncore.py,v 2.51" this patch provides:

* Added handling of buffered socket layer (pending_read(),
  pending_write()).
* Added fd number for __repr__.
* Initialized self.socket = socket._closedsocket() instead
of None
  for verbose error output (like closed socket.socket).
* asyncore and asynchat implements idispatcher and iasync_chat.
* Fixed self.addr initialization.
* Removed import exceptions.
* Don't filter KeyboardInterrupt, just pass through.
* Added queue of sockets, solves the problem of select() on
too many
  descriptors.

I have run make test in python cvs distrib without problems.
Examples of using i* included.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-06-05 21:54

Message:
Logged In: YES 
user_id=11375

I've struggled to get the test suite running without errors on my machine,

but have failed.  

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-03-22 09:15

Message:
Logged In: YES 
user_id=410460

There is no real reason for this change, please undo.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 23:18

Message:
Logged In: YES 
user_id=11375

In your version of file_dispatch.__init__, the .set_file() call is 
moved earlier; can you say why?


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 23:13

Message:
Logged In: YES 
user_id=11375

Added "map" parameter for file_dispatcher and 
dispatcher_with_send in CVS HEAD.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 23:08

Message:
Logged In: YES 
user_id=11375

Repeating exception changes ('raise socket.error' -> just 'raise')
checked into HEAD.


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 23:02

Message:
Logged In: YES 
user_id=11375

Mac code for writable() removed from HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 23:02

Message:
Logged In: YES 
user_id=11375

Patch to use True/False applied to HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 22:55

Message:
Logged In: YES 
user_id=11375

Fix for bug #887279 applied to HEAD.

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2004-03-21 22:48

Message:
Logged In: YES 
user_id=11375

The many number of changes in this patch make it difficult to 
figure out which changes fix which problem.  I've created a new 
directory in CVS, nondist/sandbox/asyncore, that contains copies of 
the module with these patches applied, and will work on applying 
changes to the copy in dist/src.

----------------------------------------------------------------------

Comment By: Alexey Klimkin (klimkin)
Date: 2004-03-17 10:15

Message:
Logged In: YES 
user_id=410460

Sorry, unfortunately I have lost old patch file. I have
atached new one.
In addition to fixes, listed above, the patch includes:

1. Fix for operating on uninitialized socket. self.socket
now initializes with _closed_socket(), so any operation
throws EBADF.
2. Added class idispatcher - base class for dispatcher. The
purpose of this class is to allow simple replacement of
media(dispatcher interface) in classes, derived from
dispatcher class. This is based on 'object'.

I have also attached asynchat.diff - example for new-style
dispatcher. Old asynchat works as well.


----------------------------------------------------------------------

Comment By: Wummel (calvin)
Date: 2004-03-11 18:49

Message:
Logged In: YES 
user_id=9205

There is no file attached! You have to click on the checkbox
next to the upload filename. This is a Sourceforge annoyance :(

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470

From noreply at sourceforge.net  Mon Jan  8 23:55:17 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 14:55:17 -0800
Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing
	sys.stdout in sitecustomize
Message-ID: <E1H43Of-0001In-16@sc8-sf-web2.sourceforge.net>

Patches item #1630975, was opened at 2007-01-08 23:55
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Neal Norwitz (nnorwitz)
Summary: Fix crash when replacing sys.stdout in sitecustomize

Initial Comment:
When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects.

Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes.

No idea how to test for this, from a unittest :P


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 02:10:56 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 17:10:56 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H45Vw-00061Z-H6@sc8-sf-web2.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 02:26:29 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 17:26:29 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H45kz-0006T6-0S@sc8-sf-web3.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 02:26:33 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 17:26:33 -0800
Subject: [Patches] [ python-Patches-1631035 ] SyntaxWarning for backquotes
Message-ID: <E1H45l3-0002Tx-3R@sc8-sf-web11.sourceforge.net>

Patches item #1631035, was opened at 2007-01-09 12:26
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Thomas Wouters (twouters)
Summary: SyntaxWarning for backquotes

Initial Comment:
The following patch (for 2.6) issues a SyntaxWarning for backquotes/backticks in source code. I had to add the filename to struct compiling in Python/ast.c - this seems like the neatest way to get the filename passed around. (see also the XXX before ast_error)

Assigned to twouters, since it was his idea in the first place. 


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 07:29:21 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 22:29:21 -0800
Subject: [Patches] [ python-Patches-1631171 ] implement warnings module in C
Message-ID: <E1H4AU5-0001yD-6q@sc8-sf-web3.sourceforge.net>

Patches item #1631171, was opened at 2007-01-08 22:29
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: implement warnings module in C

Initial Comment:
Re-implement the warnings module in C for speed and to reduce start up time.

I don't remember the exact state of this patch.  I'm sure it needs cleanup.  IIRC the only thing missing feature-wise was processing command line arguments.  Though I'm not entirely sure.  It's been a while since I did it.

I think I may have not used as many goto's in the code.  I'm also thinking I didn't like it as the error handling was too complex.  This definitely needs review.  If anyone wants to finish this off, go for it.  I'll probably return to it, but it won't be for a few weeks at the earliest.  It would probably be good to make comments to remind me of what needs to be done.

The new file should be Python/_warnings.c.  I couldn't decide whether to put it under Python/ or Modules/.  It seems some builtin modules are in both places.  Maybe we should determine where the appropriate place is and move them all there.

I couldn't figure out how to get svn to do a diff of a file that wasn't checked in.  I think I filtered out all the unrelated changes.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 07:30:09 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 22:30:09 -0800
Subject: [Patches] [ python-Patches-1631171 ] implement warnings module in C
Message-ID: <E1H4AUr-00012F-1d@sc8-sf-web6.sourceforge.net>

Patches item #1631171, was opened at 2007-01-08 22:29
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Neal Norwitz (nnorwitz)
Assigned to: Nobody/Anonymous (nobody)
Summary: implement warnings module in C

Initial Comment:
Re-implement the warnings module in C for speed and to reduce start up time.

I don't remember the exact state of this patch.  I'm sure it needs cleanup.  IIRC the only thing missing feature-wise was processing command line arguments.  Though I'm not entirely sure.  It's been a while since I did it.

I think I may have not used as many goto's in the code.  I'm also thinking I didn't like it as the error handling was too complex.  This definitely needs review.  If anyone wants to finish this off, go for it.  I'll probably return to it, but it won't be for a few weeks at the earliest.  It would probably be good to make comments to remind me of what needs to be done.

The new file should be Python/_warnings.c.  I couldn't decide whether to put it under Python/ or Modules/.  It seems some builtin modules are in both places.  Maybe we should determine where the appropriate place is and move them all there.

I couldn't figure out how to get svn to do a diff of a file that wasn't checked in.  I think I filtered out all the unrelated changes.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-08 22:30

Message:
Logged In: YES 
user_id=33168
Originator: YES

File Added: _warnings.c

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 07:55:01 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 08 Jan 2007 22:55:01 -0800
Subject: [Patches] [ python-Patches-1631035 ] SyntaxWarning for backquotes
Message-ID: <E1H4Asv-0006Jy-95@sc8-sf-web2.sourceforge.net>

Patches item #1631035, was opened at 2007-01-09 12:26
Message generated for change (Comment added) made by anthonybaxter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Thomas Wouters (twouters)
Summary: SyntaxWarning for backquotes

Initial Comment:
The following patch (for 2.6) issues a SyntaxWarning for backquotes/backticks in source code. I had to add the filename to struct compiling in Python/ast.c - this seems like the neatest way to get the filename passed around. (see also the XXX before ast_error)

Assigned to twouters, since it was his idea in the first place. 


----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-09 17:55

Message:
Logged In: YES 
user_id=29957
Originator: YES

And here's another one (on top of the last one) that also emits
SyntaxWarnings for <>

Not sure I like 'NOTEQUALSOLD' as a token name, but I couldn't think of
anything better.

This one should probably be only enabled when the Py3K warnings flag is
on. 

File Added: notequals.diff

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 12:12:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 03:12:30 -0800
Subject: [Patches] [ python-Patches-1631394 ] sre module has misleading docs
Message-ID: <E1H4Eu6-0007Lk-AF@sc8-sf-monitor2.sourceforge.net>

Patches item #1631394, was opened at 2007-01-09 11:12
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tom Lynn (tlynn)
Assigned to: Nobody/Anonymous (nobody)
Summary: sre module has misleading docs

Initial Comment:
>>> help(sre)
...
   "$"      Matches the end of the string.
...
   \Z       Matches only at the end of the string.
...
M  MULTILINE   "^" matches the beginning of lines as well as the string.
               "$" matches the end of lines as well as the string.

The docs for "$" are misleading - it actually matches in newline-specific ways which the module's built-in docs don't hint at.  The MULTILINE docs don't clarify this.

I'd also like to see "from sre import __doc__" added to the end of re.py; lack of "help(re)" is a bigger problem than having slightly wrong auto-generated docs for the re module itself.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470

From iuftsd at collyworld.freeserve.co.uk  Tue Jan  9 14:32:57 2007
From: iuftsd at collyworld.freeserve.co.uk (Flossie E.Hull)
Date: Tue, 9 Jan 2007 07:32:57 -0600
Subject: [Patches] Together,
	the startups in the space offer hundreds of pre-configured
	templates for automating IT process management right out of the box.
Message-ID: <45A39989.5050509@gossipfrom.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/patches/attachments/20070109/a4f6003f/attachment.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clap.gif
Type: image/gif
Size: 10176 bytes
Desc: not available
Url : http://mail.python.org/pipermail/patches/attachments/20070109/a4f6003f/attachment.gif 

From noreply at sourceforge.net  Tue Jan  9 19:22:39 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 10:22:39 -0800
Subject: [Patches] [ python-Patches-1631394 ] sre module has misleading docs
Message-ID: <E1H4LcN-0005TQ-SY@sc8-sf-web2.sourceforge.net>

Patches item #1631394, was opened at 2007-01-09 12:12
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tom Lynn (tlynn)
Assigned to: Nobody/Anonymous (nobody)
Summary: sre module has misleading docs

Initial Comment:
>>> help(sre)
...
   "$"      Matches the end of the string.
...
   \Z       Matches only at the end of the string.
...
M  MULTILINE   "^" matches the beginning of lines as well as the string.
               "$" matches the end of lines as well as the string.

The docs for "$" are misleading - it actually matches in newline-specific ways which the module's built-in docs don't hint at.  The MULTILINE docs don't clarify this.

I'd also like to see "from sre import __doc__" added to the end of re.py; lack of "help(re)" is a bigger problem than having slightly wrong auto-generated docs for the re module itself.


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-09 19:22

Message:
Logged In: YES 
user_id=21627
Originator: NO

Did you mean to include a patch?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 21:01:18 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 12:01:18 -0800
Subject: [Patches] [ python-Patches-1610795 ] BSD version of
	ctypes.util.find_library
Message-ID: <E1H4N9q-0001Fc-TA@sc8-sf-web1.sourceforge.net>

Patches item #1610795, was opened at 2006-12-07 14:29
Message generated for change (Comment added) made by theller
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Martin Kammerhofer (mkam)
Assigned to: Thomas Heller (theller)
Summary: BSD version of ctypes.util.find_library 

Initial Comment:
The ctypes.util.find_library function for Posix systems is actually
tailored for Linux systems. While the _findlib_gcc function relies
only on the GNU compiler and may therefore work on any system with the
"gcc" command in PATH, the _findLib_ld function relies on the
/sbin/ldconfig command (originating from SunOS 4.0) which is not
standardized. The version from GNU libc differs in option syntax and
output format from other ldconfig programs around.

I therefore provide a patch that enables find_library to properly
communicate with the ldconfig program on FreeBSD systems. It has been
tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD
systems too. (It works without this patch on FreeBSD, because after
getting an error from ldconfig it falls back to _findlib_gcc.)

While at it I also tidied up the Linux specific code: I'm escaping the
function argument before interpolating it into a regular expression (to
protect against nasty regexps) and removed the code for creation of a
temporary file that was not used in any way.


----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2007-01-09 21:01

Message:
Logged In: YES 
user_id=11105
Originator: NO

mkam, I was eventually able to test out your patch.
I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9.
The output from "print find_library('c'), find_library('m')" on these
systems is as follows:

FreeBSD6.0:  libc.so.6, libm.so.4
NetBSD3.0: libc.so.12, libm.so.0
OpenBSD3.9: libc.so.39.0, libm.so.2.1

If you think this is what is expected, I'm happy to apply the patch.  Or
is there further work needed on it?  (Do you still need the output of
"ldconfig -r" or whatever?)

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 19:43

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-12 12:28

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine,
FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions
on how many version numbers are appended to a library name any more. Even
mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a
meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to
different.)
File Added: ctypes-util.py.patch

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-11 11:10

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Hm, I did not know that OpenBSD is still using two version numbers for
shared library.
(I conclude that from the "libc.so.39.0" in the previous followup. Btw
FreeBSD has used
a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout
executable format.)
Unfortunately my freebsd patch has the assumption of a single version
number built in;
more specifically the
  cmp(* map(lambda x: int(x.split('.')[-1]), (a, b)))
is supposed to sort based an the last dot separated field. I guess that
OpenBSD system
does not have another libc, at least none with a minor > 0. ;-)
Thomas, can you mail me the output of "ldconfig -r"? I will refine the
patch then,
doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/
fields. Said output from NetBSD welcome too. DragonflyBSD should be no
problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform
like?

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-08 21:32

Message:
Logged In: YES 
user_id=11105
Originator: NO

I have tested the patch on FreeBSD 6.0 and (after extending the check to
test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works
fine.

find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0
in OpenBSD 3.9, while it returned 'None' before on both machines.

----------------------------------------------------------------------

Comment By: David Remahl (chmod007)
Date: 2006-12-08 08:50

Message:
Logged In: YES 
user_id=2135
Originator: NO

# Does this work (without the gcc fallback) on other *BSD systems too?

I don't know, but it doesn't work on Darwin (which already has a custom
method through macholib).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-07 22:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Will do (although I would appreciate review from others too; I'm not
exactly a BSD expert).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-07 20:15

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thomas, can you take a look?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

From noreply at sourceforge.net  Tue Jan  9 23:54:09 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 14:54:09 -0800
Subject: [Patches] [ python-Patches-1500611 ] (py3k) Remove the sets module
Message-ID: <E1H4Pr6-0005GE-C5@sc8-sf-web7.sourceforge.net>

Patches item #1500611, was opened at 2006-06-04 16:38
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: (py3k) Remove the sets module

Initial Comment:
This patch removes the sets module, its documentation
and tests, in addition to replacing all usages of it
with the built-in set type.

The patch is against r46648.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 17:54

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: py3k-remove_sets_module.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2006-08-31 18:44

Message:
Logged In: YES 
user_id=1344176

The patch has been updated to r51654.

I'm not sure how well `svn diff` handles removed files, so
you might have to `svn rm` Lib/sets.py,
Lib/test/test_sets.py and Doc/lib/libsets.py manually.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-26 16:26

Message:
Logged In: YES 
user_id=6380

This patch seems out of date -- can you refresh it?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 02:29:17 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 17:29:17 -0800
Subject: [Patches] [ python-Patches-1500611 ] (py3k) Remove the sets module
Message-ID: <E1H4SHF-0004jH-N4@sc8-sf-web2.sourceforge.net>

Patches item #1500611, was opened at 2006-06-04 16:38
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 3000
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: (py3k) Remove the sets module

Initial Comment:
This patch removes the sets module, its documentation
and tests, in addition to replacing all usages of it
with the built-in set type.

The patch is against r46648.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-09 20:29

Message:
Logged In: YES 
user_id=6380
Originator: NO

Checked in.  Thanks!

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 17:54

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: py3k-remove_sets_module.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2006-08-31 18:44

Message:
Logged In: YES 
user_id=1344176

The patch has been updated to r51654.

I'm not sure how well `svn diff` handles removed files, so
you might have to `svn rm` Lib/sets.py,
Lib/test/test_sets.py and Doc/lib/libsets.py manually.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-08-26 16:26

Message:
Logged In: YES 
user_id=6380

This patch seems out of date -- can you refresh it?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 03:12:52 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 18:12:52 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4SxQ-0000Je-G4@sc8-sf-web8.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 03:13:23 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 18:13:23 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4Sxv-00073T-0P@sc8-sf-web3.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: doc_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 03:13:53 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 18:13:53 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4SyP-0000Pf-1K@sc8-sf-web8.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: fixup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: doc_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 03:14:58 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 18:14:58 -0800
Subject: [Patches] [ python-Patches-1630248 ] Implement named exception
	cleanup
Message-ID: <E1H4SzS-00079x-Il@sc8-sf-web3.sourceforge.net>

Patches item #1630248, was opened at 2007-01-07 22:02
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
>Status: Closed
>Resolution: Duplicate
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Implement named exception cleanup

Initial Comment:
This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

Specifically,

try:
  ...
except ExcType, e:
  #body

is translated to

try:
  ...
except ExcType, e:
  try:
    # body
  finally:
    e = None
    del e


The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:14

Message:
Logged In: YES 
user_id=1344176
Originator: YES

This patched has been superseded by patch #1631942.

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:51

Message:
Logged In: YES 
user_id=1344176
Originator: YES

Patches updated in reponse to PJE's comment
(http://mail.python.org/pipermail/python-3000/2007-January/005430.html):

"""In the tuple or list case, there's no need to reset the variables,
because 
then the traceback won't be present any more; the exception object will 
have been discarded after unpacking."""

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:50

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: exc_cleanup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-08 11:49

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:34

Message:
Logged In: YES 
user_id=1344176
Originator: YES

This is the first time I've done this kind of surgery on the compiler, so
any tips/tricks/advice would be greatly appreciated.

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-07 22:02

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: stdlib_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 04:41:43 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 19:41:43 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4ULP-0003xy-8R@sc8-sf-web11.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
>Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-09 22:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Reviewing...  Seems the merge tfrom the 2.6 trunk that Thomas did made
some changes to tarfile.py.

Did you have to manually patch anything up after running 2to3/refactor.py
-f except on the entire stdlib?

patching file Lib/tarfile.py
Hunk #1 succeeded at 1540 (offset 38 lines).
Hunk #3 succeeded at 1573 (offset 38 lines).
Hunk #5 succeeded at 1745 (offset 38 lines).
Hunk #7 succeeded at 1786 (offset 38 lines).
Hunk #9 FAILED at 1795.
1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: fixup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: doc_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 06:39:15 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 09 Jan 2007 21:39:15 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4WB9-0001dE-3u@sc8-sf-monitor2.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Python 3000
Status: Open
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-10 00:39

Message:
Logged In: YES 
user_id=6380
Originator: NO

For some strange reason, test_exceptions was wrong.  I'm guessing that the
newly added test should be this:
 
    def testExceptionCleanup(self):
        # Make sure "except V as N" exceptions are cleaned up properly
        
        try:
            raise Exception()
        except Exception as e:
            self.failUnless(e)
        self.failIf('e' in locals())

(it had ', e' instead of 'as e', and there was an unneeded 'del e' after
self.failUnless(e).)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-09 22:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Reviewing...  Seems the merge tfrom the 2.6 trunk that Thomas did made
some changes to tarfile.py.

Did you have to manually patch anything up after running 2to3/refactor.py
-f except on the entire stdlib?

patching file Lib/tarfile.py
Hunk #1 succeeded at 1540 (offset 38 lines).
Hunk #3 succeeded at 1573 (offset 38 lines).
Hunk #5 succeeded at 1745 (offset 38 lines).
Hunk #7 succeeded at 1786 (offset 38 lines).
Hunk #9 FAILED at 1795.
1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: fixup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: doc_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 12:58:00 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 03:58:00 -0800
Subject: [Patches] [ python-Patches-1610795 ] BSD version of
	ctypes.util.find_library
Message-ID: <E1H4c5g-00027h-CR@sc8-sf-web1.sourceforge.net>

Patches item #1610795, was opened at 2006-12-07 14:29
Message generated for change (Comment added) made by mkam
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Martin Kammerhofer (mkam)
Assigned to: Thomas Heller (theller)
Summary: BSD version of ctypes.util.find_library 

Initial Comment:
The ctypes.util.find_library function for Posix systems is actually
tailored for Linux systems. While the _findlib_gcc function relies
only on the GNU compiler and may therefore work on any system with the
"gcc" command in PATH, the _findLib_ld function relies on the
/sbin/ldconfig command (originating from SunOS 4.0) which is not
standardized. The version from GNU libc differs in option syntax and
output format from other ldconfig programs around.

I therefore provide a patch that enables find_library to properly
communicate with the ldconfig program on FreeBSD systems. It has been
tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD
systems too. (It works without this patch on FreeBSD, because after
getting an error from ldconfig it falls back to _findlib_gcc.)

While at it I also tidied up the Linux specific code: I'm escaping the
function argument before interpolating it into a regular expression (to
protect against nasty regexps) and removed the code for creation of a
temporary file that was not used in any way.


----------------------------------------------------------------------

>Comment By: Martin Kammerhofer (mkam)
Date: 2007-01-10 12:58

Message:
Logged In: YES 
user_id=1656067
Originator: YES

The output looks good. The patch selects the numerically highest library
version.
NetBSD is not handled by the patch but works through _findLib_gcc (which
will also
be tried as a fallback strategy for Free/Open-BSD when ldconfig output
parsing fails.)

I think the patch is ready for commit.


----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-09 21:01

Message:
Logged In: YES 
user_id=11105
Originator: NO

mkam, I was eventually able to test out your patch.
I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9.
The output from "print find_library('c'), find_library('m')" on these
systems is as follows:

FreeBSD6.0:  libc.so.6, libm.so.4
NetBSD3.0: libc.so.12, libm.so.0
OpenBSD3.9: libc.so.39.0, libm.so.2.1

If you think this is what is expected, I'm happy to apply the patch.  Or
is there further work needed on it?  (Do you still need the output of
"ldconfig -r" or whatever?)

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 19:43

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-12 12:28

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine,
FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions
on how many version numbers are appended to a library name any more. Even
mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a
meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to
different.)
File Added: ctypes-util.py.patch

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-11 11:10

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Hm, I did not know that OpenBSD is still using two version numbers for
shared library.
(I conclude that from the "libc.so.39.0" in the previous followup. Btw
FreeBSD has used
a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout
executable format.)
Unfortunately my freebsd patch has the assumption of a single version
number built in;
more specifically the
  cmp(* map(lambda x: int(x.split('.')[-1]), (a, b)))
is supposed to sort based an the last dot separated field. I guess that
OpenBSD system
does not have another libc, at least none with a minor > 0. ;-)
Thomas, can you mail me the output of "ldconfig -r"? I will refine the
patch then,
doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/
fields. Said output from NetBSD welcome too. DragonflyBSD should be no
problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform
like?

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-08 21:32

Message:
Logged In: YES 
user_id=11105
Originator: NO

I have tested the patch on FreeBSD 6.0 and (after extending the check to
test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works
fine.

find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0
in OpenBSD 3.9, while it returned 'None' before on both machines.

----------------------------------------------------------------------

Comment By: David Remahl (chmod007)
Date: 2006-12-08 08:50

Message:
Logged In: YES 
user_id=2135
Originator: NO

# Does this work (without the gcc fallback) on other *BSD systems too?

I don't know, but it doesn't work on Darwin (which already has a custom
method through macholib).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-07 22:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Will do (although I would appreciate review from others too; I'm not
exactly a BSD expert).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-07 20:15

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thomas, can you take a look?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 15:40:24 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 06:40:24 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4ecq-00016p-NZ@sc8-sf-web1.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-10 09:40

Message:
Logged In: YES 
user_id=1344176
Originator: YES

I think there were only four files I had to patch manually after running
2to3; each used automatic exception unpacking.

2to3 successfully fixes Lib/tarfile.py (as of tarfile.py r53336, 2to3
r53339).

The 'del e' in testExceptionCleanup() was indeed needed; it was there to
verify that the transformation was

    except V as N:
       try:
          ...
       finally:
          N = None
          del N

and not

    except V as N:
       try:
          ...
       finally:
          del N

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-10 00:39

Message:
Logged In: YES 
user_id=6380
Originator: NO

For some strange reason, test_exceptions was wrong.  I'm guessing that the
newly added test should be this:
 
    def testExceptionCleanup(self):
        # Make sure "except V as N" exceptions are cleaned up properly
        
        try:
            raise Exception()
        except Exception as e:
            self.failUnless(e)
        self.failIf('e' in locals())

(it had ', e' instead of 'as e', and there was an unneeded 'del e' after
self.failUnless(e).)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-09 22:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Reviewing...  Seems the merge tfrom the 2.6 trunk that Thomas did made
some changes to tarfile.py.

Did you have to manually patch anything up after running 2to3/refactor.py
-f except on the entire stdlib?

patching file Lib/tarfile.py
Hunk #1 succeeded at 1540 (offset 38 lines).
Hunk #3 succeeded at 1573 (offset 38 lines).
Hunk #5 succeeded at 1745 (offset 38 lines).
Hunk #7 succeeded at 1786 (offset 38 lines).
Hunk #9 FAILED at 1795.
1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: fixup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: doc_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 17:23:41 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 08:23:41 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4gEn-0002f6-1O@sc8-sf-web1.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-10 11:23

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!!

Submitted, with tarfile.py and test_exceptions.py corrected (kept the 'del
e' in the latter).
Committed revision 53342.

Note: there is now a new test_hotshot failure, probably due to the
different code generated for except clauses; I'm keeping this patch open
for that.

Is it time to drop the unpacking (sequence) behavior from exceptions
altogether, per Brett's PEP?  (That would be a new SF patch.)

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-10 09:40

Message:
Logged In: YES 
user_id=1344176
Originator: YES

I think there were only four files I had to patch manually after running
2to3; each used automatic exception unpacking.

2to3 successfully fixes Lib/tarfile.py (as of tarfile.py r53336, 2to3
r53339).

The 'del e' in testExceptionCleanup() was indeed needed; it was there to
verify that the transformation was

    except V as N:
       try:
          ...
       finally:
          N = None
          del N

and not

    except V as N:
       try:
          ...
       finally:
          del N

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-10 00:39

Message:
Logged In: YES 
user_id=6380
Originator: NO

For some strange reason, test_exceptions was wrong.  I'm guessing that the
newly added test should be this:
 
    def testExceptionCleanup(self):
        # Make sure "except V as N" exceptions are cleaned up properly
        
        try:
            raise Exception()
        except Exception as e:
            self.failUnless(e)
        self.failIf('e' in locals())

(it had ', e' instead of 'as e', and there was an unneeded 'del e' after
self.failUnless(e).)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-09 22:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Reviewing...  Seems the merge tfrom the 2.6 trunk that Thomas did made
some changes to tarfile.py.

Did you have to manually patch anything up after running 2to3/refactor.py
-f except on the entire stdlib?

patching file Lib/tarfile.py
Hunk #1 succeeded at 1540 (offset 38 lines).
Hunk #3 succeeded at 1573 (offset 38 lines).
Hunk #5 succeeded at 1745 (offset 38 lines).
Hunk #7 succeeded at 1786 (offset 38 lines).
Hunk #9 FAILED at 1795.
1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: fixup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: doc_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From qemqt at leogavazzi.it  Wed Jan 10 18:49:32 2007
From: qemqt at leogavazzi.it (Watty Miller)
Date: Wed, 10 Jan 2007 17:49:32 -0000
Subject: [Patches] Asbjorn Lonvig art works - Auctions Worldwide.
Message-ID: <001501c70ccc$11c7ffd0$cbdc6b3f@fqyf>


We heard the fireworks but there were too many buildings and trees between us and 1Utama where the party was at. Some do say that the flaws in a person is actually what makes that person perfect.
I haven't dreamt in a long while, don't ask me why. Easily generate code using the DataBlock Modeler tool. com is the best online resource for Online Universities.
The city however needs - in my opinion - something indicating Rome is amodern city. New:Drafts for Paper Cut-Outs exhibited I have been asked toconsidersmaller sizes - sizes under 1 meter. SpaceToNbsp: Converts all spaces to HTML non-breaking spaces. com makes it fast and easy to find rome high schoolRome High SchoolReunite With Old High School Friends, Alumni and Old Flames.
Find roman historyHelpful Links for roman historyFind roman history at Netster.
NET objects or making sure that a specific piece of code is running in a transaction. A lot has happened since my last post.
org  The RSS feeds are submitted to news feed directories.
The new us that we worked an entire month at.
Antique Rome Exclusive Selection by Asbjorn Lonvig. As a point of departure editions are open.
The competition is judged solely by visualssubmitted online. In Italy, the Feast of the Epiphany is a big part of the Christmas holidays, celebrated.
Although there are reports that certain combinations of browsers and Acrobat versions are not vulnerable, upgrading might be the easiest path to ensure vulnerability is gone.
Download Data Dictionary Creator 1.
StripTags: Removes all HTML tags from the passed string. Muslims who don't even have the basic needs of shelter nevermind food because they are refugees in their own country.
com : rome guides - from haxwax-rome. Malaysia my beloved country had just celebrated its birthday.
I have seen Via Appia and the Catacombs in Rome. Read Complete Article : Rome.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/patches/attachments/20070110/8c24796e/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 15693 bytes
Desc: not available
Url : http://mail.python.org/pipermail/patches/attachments/20070110/8c24796e/attachment.gif 

From noreply at sourceforge.net  Wed Jan 10 19:24:04 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 10:24:04 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H4i7I-0004hC-Io@sc8-sf-web8.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 01:37
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 10:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 17:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 17:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 10:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 02:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-06 21:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 21:00:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 12:00:30 -0800
Subject: [Patches] [ python-Patches-1631942 ] New exception syntax
Message-ID: <E1H4jcc-0000Ey-Ut@sc8-sf-web7.sourceforge.net>

Patches item #1631942, was opened at 2007-01-09 21:12
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Python 3000
>Status: Closed
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: New exception syntax

Initial Comment:
The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles.

new_exceptions.patch is the implementation and tests.
fixup.patch adjusts the stdlib to use the new syntax.
doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code.

All patches are against r53289.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-10 15:00

Message:
Logged In: YES 
user_id=1344176
Originator: YES

The hotshot failure may have been related to magic number/bytecode
differences, but since a "make clean" resolves the problem, the issue is
considered closed
(http://mail.python.org/pipermail/python-3000/2007-January/005501.html).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-10 11:23

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!!

Submitted, with tarfile.py and test_exceptions.py corrected (kept the 'del
e' in the latter).
Committed revision 53342.

Note: there is now a new test_hotshot failure, probably due to the
different code generated for except clauses; I'm keeping this patch open
for that.

Is it time to drop the unpacking (sequence) behavior from exceptions
altogether, per Brett's PEP?  (That would be a new SF patch.)

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-10 09:40

Message:
Logged In: YES 
user_id=1344176
Originator: YES

I think there were only four files I had to patch manually after running
2to3; each used automatic exception unpacking.

2to3 successfully fixes Lib/tarfile.py (as of tarfile.py r53336, 2to3
r53339).

The 'del e' in testExceptionCleanup() was indeed needed; it was there to
verify that the transformation was

    except V as N:
       try:
          ...
       finally:
          N = None
          del N

and not

    except V as N:
       try:
          ...
       finally:
          del N

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-10 00:39

Message:
Logged In: YES 
user_id=6380
Originator: NO

For some strange reason, test_exceptions was wrong.  I'm guessing that the
newly added test should be this:
 
    def testExceptionCleanup(self):
        # Make sure "except V as N" exceptions are cleaned up properly
        
        try:
            raise Exception()
        except Exception as e:
            self.failUnless(e)
        self.failIf('e' in locals())

(it had ', e' instead of 'as e', and there was an unneeded 'del e' after
self.failUnless(e).)

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-09 22:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Reviewing...  Seems the merge tfrom the 2.6 trunk that Thomas did made
some changes to tarfile.py.

Did you have to manually patch anything up after running 2to3/refactor.py
-f except on the entire stdlib?

patching file Lib/tarfile.py
Hunk #1 succeeded at 1540 (offset 38 lines).
Hunk #3 succeeded at 1573 (offset 38 lines).
Hunk #5 succeeded at 1745 (offset 38 lines).
Hunk #7 succeeded at 1786 (offset 38 lines).
Hunk #9 FAILED at 1795.
1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: fixup.patch

----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2007-01-09 21:13

Message:
Logged In: YES 
user_id=1344176
Originator: YES

File Added: doc_fixes.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 21:24:51 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 12:24:51 -0800
Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at
	all
Message-ID: <E1H4k0B-0001Xv-7P@sc8-sf-web6.sourceforge.net>

Patches item #1627052, was opened at 2007-01-03 15:21
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Georg Brandl (gbrandl)
Summary: backticks will not be used at all

Initial Comment:
In python 3, backticks will not mean repr.

Every few months, someone suggests a new meaning for them.

This clarifies that they won't be reused at all.

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-10 20:24

Message:
Logged In: YES 
user_id=849994
Originator: NO

Committed as rev. 53359. Had to fix the markup a bit.

Can anyone tell me how to include a lone backtick in a ReST `` `` block?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-03 15:22

Message:
Logged In: YES 
user_id=764593
Originator: YES

Assigning to PEP owner, Georg.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 21:30:36 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 12:30:36 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H4k5k-0007gy-G4@sc8-sf-monitor2.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Wed Jan 10 21:59:14 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 12:59:14 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H4kXR-0004Ge-Vx@sc8-sf-web1.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 10:37
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 21:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 21:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 19:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 02:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 02:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 19:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 11:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 06:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Thu Jan 11 01:13:03 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 16:13:03 -0800
Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at
	all
Message-ID: <E1H4nZ1-0004rH-MS@sc8-sf-web8.sourceforge.net>

Patches item #1627052, was opened at 2007-01-03 10:21
Message generated for change (Comment added) made by goodger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 3000
Status: Closed
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Georg Brandl (gbrandl)
Summary: backticks will not be used at all

Initial Comment:
In python 3, backticks will not mean repr.

Every few months, someone suggests a new meaning for them.

This clarifies that they won't be reused at all.

----------------------------------------------------------------------

>Comment By: David Goodger (goodger)
Date: 2007-01-10 19:13

Message:
Logged In: YES 
user_id=7733
Originator: NO

Just do it: "`````".  The meaning tends to get lost in the noise though.

What you did is fine, but you don't need the backslash-escape. reST is
smart enough to realize that (`) is ` in parentheses.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-01-10 15:24

Message:
Logged In: YES 
user_id=849994
Originator: NO

Committed as rev. 53359. Had to fix the markup a bit.

Can anyone tell me how to include a lone backtick in a ReST `` `` block?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-03 10:22

Message:
Logged In: YES 
user_id=764593
Originator: YES

Assigning to PEP owner, Georg.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470

From noreply at sourceforge.net  Thu Jan 11 02:20:27 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 17:20:27 -0800
Subject: [Patches] [ python-Patches-1615701 ] Creating dicts for dict
	subclasses
Message-ID: <E1H4ocF-0002HF-E0@sc8-sf-monitor2.sourceforge.net>

Patches item #1615701, was opened at 2006-12-14 08:08
Message generated for change (Settings changed) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Walter D?rwald (doerwalter)
>Assigned to: Raymond Hettinger (rhettinger)
Summary: Creating dicts for dict subclasses

Initial Comment:
This patch changes dictobject.c so that creating dicts from mapping like objects only uses the internal dict functions if the argument is a *real* dict, not a subclass. This means that overwritten keys() and __getitem__() methods are now honored. In addition to that the fallback implementation now tries iterkeys() before trying keys(). It also adds a PyMapping_IterKeys() macro.

----------------------------------------------------------------------

Comment By: Walter D?rwald (doerwalter)
Date: 2006-12-20 07:59

Message:
Logged In: YES 
user_id=89016
Originator: YES

To clear up some apparent misunderstandings: This patch does *not*
advocate that some dict methods should be implemented by calling other
dict methods so that dict subclasses only have to overwrite a few methods
to gain a completely consistent implementation.

This patch only fixes the dict constructor (and update()) and consists of
two parts:

(1) There are three code paths in dictobject.c::dict_update_common(): (a)
if the constructor argument doesn't have a "keys" attribute treat it as a
iterable of items; (b) if the argument has a "keys" attribute, but is not
a dict (and not an instance of a subclass of dict), use keys() and
__getitem__() to make a copy of the mapping-like object. (c) if the
argument has a "keys" attribute and is a dict (or an instance of a
subclass of dict) bypass any of the overwritten methods that the object
might provide and directly use the dict implementation. This patch changes
PyDict_Merge() so that code path (b) is used for dict constructor arguments
that are subclasses of dict, so that any overwritten methods are honored.

(2) This means that now if a subclass of dict is passed to the constructor
or update() the code is IMHO more correct (it honors the reimplemenation of
the mapping methods), but slower. To reduce the slowdown instead of using
kesY() and __getitem__(), iterkeys() and __getitem__() are used.

I can't see why the current behaviour should be better: Yes, it is faster,
but it is also wrong: Without the patch the behaviour of dict() and
dict.update() depends on the fact whether the argument happens to subclass
dict or not. If it doesn't all is well: the argument is treated as a
mapping (i.e. keys() and __getitem__() are used) otherwise the methods are
completely ignored.

So can we agree on the fact that (1) is desirable? (At least Guido said as
much:
http://mail.python.org/pipermail/python-dev/2006-December/070341.html)

BTW, I only added PyMapping_Iterkeys() because it mirrors
PyMapping_Keys().

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-12-19 19:13

Message:
Logged In: YES 
user_id=80475
Originator: NO

Since update already supports (key, item) changes, I do not see that
rationale in trying to expand the definition what is dict-like to include
a try-this, then try-that approach.  This is a little too ad-hoc for too
little benefit.

Also, I do not see the point of adding PyMapping_Iterkeys to the C API. 
It affords no advantage over its macro definition (the current
one-way-to-do-it). 

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-12-19 18:00

Message:
Logged In: YES 
user_id=80475
Originator: NO

It is also asking for bugs if someone hooks __getitem__ and starts to make
possibly invalid assumptions about what other changes occur implicitly.

Also, this patch kills the performance of builtin subclasses.  If I
subclass dict to add a new method, it would suck to have the performance
of all of the other methods drop percariously.

This patch is somewhat overzealous.  It encroaches on the terriority of
UserDict.DictMixin which was specifically made for propagating new
behaviors.  It unnecessarily exposes implementation details.  It
introduces implicit behaviors that should be done through explicit
overrides of methods whose behavior is supposed to change.  

And, it is on the top of a slippery slope that we don't want to go down
(i.e. do we want to guarantee that list.append is implemented in terms of
list.extend, etc).  Python has no shortage of places where builtin
subclasses make direct calls to the underlying C code -- this patch leads
to a bottomless pit of changes that kill performance and make implicit
side-effects the norm instead of the exception.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-19 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

FWIW, I'm not sure I agree on not specifying which methods call share
implementation.

If someone hooks __getitem__ but not get, that is just asking for bugs. 
(The implementation of get may -- but need not -- make its own call to
__getitem__, and not everyone will make the same decision.)

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-19 17:26

Message:
Logged In: YES 
user_id=764593
Originator: NO

As I understand it, the problem is that dict.update is assuming any dict
subclass will use the same internal data representation.

Restricting the fast path to exactly builtin dicts (not subclasses) fixes
the bug, but makes the fallback more frequent.

The existing fallback is to call keys(), then iterate over it, retrieving
the value for each key.  (keys is required for a "minimal mapping" as
documented is UserDict, and a few other random places.)

The only potential dependency on other methods is his proposed new
intermediate path that avoids creating a list of all keys, by using
iterkeys if it exists.  (I suggested using iteritems to avoid the
lookups.)  If iter* aren't implemented, the only harm is falling back to
the existing fallback of "for k in keys():"


----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2006-12-19 16:07

Message:
Logged In: YES 
user_id=80475
Originator: NO

I'm -1 on making ANY guarantees about which methods underlie others --
that would constitute new and everlasting guarantees about how mappings
are implemented.  Subclasses should explicity override/extend the methods
withed changed behavior.  If that proves non-trivial, then it is likely
there should be a has-a relationship instead of an is-a relationship. 
Also, it is likely that the subclass will have Liskov substitutability
violations.  Either way, there is probably a design flaw.

----------------------------------------------------------------------

Comment By: Walter D?rwald (doerwalter)
Date: 2006-12-19 14:23

Message:
Logged In: YES 
user_id=89016
Originator: YES

iteritems() has to create a new tuple for each item, so this might be
slower.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-19 12:50

Message:
Logged In: YES 
user_id=764593
Originator: NO

Why are you using iterkeys instead of iteritems?

It seems like if they've filled out the interface enough to have iterkeys,
they've probably filled it out all the way, and you do need the value as
soon as you get the key.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470

From noreply at sourceforge.net  Thu Jan 11 08:55:58 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 10 Jan 2007 23:55:58 -0800
Subject: [Patches] [ python-Patches-1630118 ] Patch to add
	tempfile.SpooledTemporaryFile (for #415692)
Message-ID: <E1H4un0-0001ch-Ps@sc8-sf-web2.sourceforge.net>

Patches item #1630118, was opened at 2007-01-07 20:36
Message generated for change (Comment added) made by arigo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Dustin J. Mitchell (djmitche)
Assigned to: Nobody/Anonymous (nobody)
Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692)

Initial Comment:
Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases.

----------------------------------------------------------------------

>Comment By: Armin Rigo (arigo)
Date: 2007-01-11 07:55

Message:
Logged In: YES 
user_id=4771
Originator: NO

Reimplementing the whole file interface as a proxy functions might be the
safest route, yes.

I realized that the __getattr__() magic is also used to serve at least one
special method, namely the __iter__() of the file objects.  This only works
with old-style classes.  In the long-term future, when old-style classes
disappear and these classes become new-style, this is likely to introduce a
subtle bug.

----------------------------------------------------------------------

Comment By: Dustin J. Mitchell (djmitche)
Date: 2007-01-08 15:53

Message:
Logged In: YES 
user_id=7446
Originator: YES

I agree it would break in such a situation, but I'm not clear on which
direction your bias leads you (specifically, which do we get right -- don't
use bound methods, or don't use the __getattr__ magic?).

I could fix this by defining "proxy" functions (and some properties) for
the whole file interface, rather than just the methods that potentially
trigger rollover.  That would lose a little efficiency, but mostly only in
reading (calling e.g., f.read() will always result in two function
applications; in the current model, after the first call it runs at
"native" speed).  It would also lose forward compatibility if the file
protocol changes, although I'm not sure how likely that is.

Would you like me to do that?

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2007-01-08 08:26

Message:
Logged In: YES 
user_id=4771
Originator: NO

The __getattr__ magic makes the following kind of code fail with
SpooledTemporaryFile:

  f = SpooledTemporaryFile(max_size=something)
  rd = f.read
  wr = f.write
  for x in y:
      ...use rd(size) and wr(data)...

The problem is that the captured 'f.read' method is the one from the
StringIO instance, even after the write() rolled the file over to disk. 
Given that capturing bound methods is a semi-official speed hack advertised
in some respected places, we might have to be careful about it.  About such
matters I am biased towards first getting it right and then getting it
fast...

Also, Python 2.5 is already out, so this will probably be a 2.6 addition.

----------------------------------------------------------------------

Comment By: Dustin J. Mitchell (djmitche)
Date: 2007-01-07 20:37

Message:
Logged In: YES 
user_id=7446
Originator: YES

File Added: SpooledTemporaryFile.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470

From noreply at sourceforge.net  Thu Jan 11 22:50:27 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 13:50:27 -0800
Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile
	patch
Message-ID: <E1H57oZ-00045D-H9@sc8-sf-web5.sourceforge.net>

Patches item #1598415, was opened at 2006-11-17 15:44
Message generated for change (Comment added) made by vsajip
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.5
Status: Open
Resolution: Invalid
Priority: 5
Private: No
Submitted By: chads (cjschr)
Assigned to: Vinay Sajip (vsajip)
Summary: Logging Module - followfile patch

Initial Comment:
Pertaining to the FileHandler and the file being written to:

It's possible that the file being written to will be
rolled-over by an external application such as newsyslog.
By default, FileHandler tracks the file descriptor, 
not the file.  If the original file is renamed, the file 
descriptor is still updated; however, it's probably 
desired that continued updates to the original file take 
place instead.

This patch adds an attribute to the FileHandler class
constructor (and basicConfig kw as well).  If the 
attribute evaluates to True, the filename, not the 
descriptor is tracked.  Basically, the code compares the 
file status from a previous emit call to the current call
before the base class emit is called.  If a difference in
st_ino or st_dev is found, the current stream is 
flush/closed and a new one, based on baseFilename, is
created, file status is updated, and then the base class 
emit is called.


----------------------------------------------------------------------

>Comment By: Vinay Sajip (vsajip)
Date: 2007-01-11 21:50

Message:
Logged In: YES 
user_id=308438
Originator: NO

I've had a bit more of a think about this, and realised that I made a
boo-boo in one of my earlier comments. Under Windows, log files are opened
with exclusive locks, so that other processes cannot rename or move files
which are open. So I believe the approach won't work at all under Windows.
(Chad, sorry about making you redo the patch with ST_SIZE rather than
ST_DEV and ST_INO).

I also think this is a less common use case than warrants supporting it at
the basicConfig() level, which is for really very basic usage
configuration. So I would advocate adding a WatchedFileHandler (in
logging.handlers) which watches st_dev and st_ino (as per Chad's original
patch) and closes the old file descriptor and reopens the file when a
change is seen. Some recent changes checked into SVN trunk facilitate the
reopening - I've added an _open() method to FileHandler to do this.

Chad, what do you think of this approach?

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 17:06

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Uploaded the wrong diff.  This is the correct one.

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 17:02

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Updated per vsajip to work on Windoze too.  The code now
checks for a current size < previous size (based on ST_SIZE).

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2006-11-19 20:32

Message:
Logged In: YES 
user_id=308438
Originator: NO

This patch, relying as it does on Unix-specific details such as i-nodes,
does not appear as if it will work under Windows. For that reason I will
mark it as Pending and Invalid for now, if cjschr can update this tracker
item with how the patch will work on Windows, I will look at it further.
The SF system will automatically close it if no update is made to the item
in approx. 2 weeks, though it can still be reopened after that.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-11-18 19:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

Assigning to Vinay.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 03:42:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 18:42:11 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5CMs-0001ce-Bu@sc8-sf-web6.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 03:50:13 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 18:50:13 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5CUf-0002Yg-JT@sc8-sf-web3.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 04:12:28 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 19:12:28 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5CqC-0007ha-0Z@sc8-sf-web4.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 05:25:49 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 20:25:49 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5DzB-0007LF-Lj@sc8-sf-web7.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat
produces identical output before and after the patch, for both debug and
release builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have
long-lived tiny slices of gigantic strings.  If you realize you're having
that problem, simply add calls to .simplify() on the slices and the
problem should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in
out-of-memory conditions; for right now, while the patch is under early
review, I suspect that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they
hit the end.  Again, suggest that users of the Unicode API ask for the
pointer *first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files
*even fussier* to work on than they already are.  But if you don't touch
those files, you won't notice the difference*, and the patch makes some
Python string operations faster without making anything else slower.  At
the very least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the
macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE
buffer. As a result, a memory caused during the concatenation process
cannot be passed back up the call stack. The NULL return value would
result in a plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off.
The memory consumption outweighs the performance you gain by using the 'x
+= y' approach. The ''.join(list) approach also doesn't really help if
you're after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids
creating short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 05:32:06 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 20:32:06 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5E5G-00067T-9f@sc8-sf-web5.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 07:55:34 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 22:55:34 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5GK6-0002hm-AO@sc8-sf-monitor2.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 01:37
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-11 22:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 20:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 20:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 19:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat
produces identical output before and after the patch, for both debug and
release builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have
long-lived tiny slices of gigantic strings.  If you realize you're having
that problem, simply add calls to .simplify() on the slices and the
problem should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 18:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in
out-of-memory conditions; for right now, while the patch is under early
review, I suspect that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they
hit the end.  Again, suggest that users of the Unicode API ask for the
pointer *first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files
*even fussier* to work on than they already are.  But if you don't touch
those files, you won't notice the difference*, and the patch makes some
Python string operations faster without making anything else slower.  At
the very least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 12:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the
macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE
buffer. As a result, a memory caused during the concatenation process
cannot be passed back up the call stack. The NULL return value would
result in a plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off.
The memory consumption outweighs the performance you gain by using the 'x
+= y' approach. The ''.join(list) approach also doesn't really help if
you're after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids
creating short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 12:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 10:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 17:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 17:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 10:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 02:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-06 21:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 08:13:37 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 23:13:37 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1H5GbZ-0005lh-5K@sc8-sf-web4.sourceforge.net>

Patches item #1633807, was opened at 2007-01-12 18:13
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Nobody/Anonymous (nobody)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 08:31:26 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 11 Jan 2007 23:31:26 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1H5Gso-0006P7-VM@sc8-sf-web1.sourceforge.net>

Patches item #1633807, was opened at 2007-01-12 18:13
Message generated for change (Comment added) made by anthonybaxter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Nobody/Anonymous (nobody)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-12 18:31

Message:
Logged In: YES 
user_id=29957
Originator: YES

Updated version of patch - fixes interactive mode, adds builtins.print

File Added: print_function.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 18:57:07 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 12 Jan 2007 09:57:07 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5QeJ-0005Yy-I7@sc8-sf-web6.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 21:11:13 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 12 Jan 2007 12:11:13 -0800
Subject: [Patches] [ python-Patches-1610795 ] BSD version of
	ctypes.util.find_library
Message-ID: <E1H5Sk5-0003G1-IH@sc8-sf-web3.sourceforge.net>

Patches item #1610795, was opened at 2006-12-07 14:29
Message generated for change (Comment added) made by theller
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
>Priority: 9
Private: No
Submitted By: Martin Kammerhofer (mkam)
>Assigned to: Neal Norwitz (nnorwitz)
Summary: BSD version of ctypes.util.find_library 

Initial Comment:
The ctypes.util.find_library function for Posix systems is actually
tailored for Linux systems. While the _findlib_gcc function relies
only on the GNU compiler and may therefore work on any system with the
"gcc" command in PATH, the _findLib_ld function relies on the
/sbin/ldconfig command (originating from SunOS 4.0) which is not
standardized. The version from GNU libc differs in option syntax and
output format from other ldconfig programs around.

I therefore provide a patch that enables find_library to properly
communicate with the ldconfig program on FreeBSD systems. It has been
tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD
systems too. (It works without this patch on FreeBSD, because after
getting an error from ldconfig it falls back to _findlib_gcc.)

While at it I also tidied up the Linux specific code: I'm escaping the
function argument before interpolating it into a regular expression (to
protect against nasty regexps) and removed the code for creation of a
temporary file that was not used in any way.


----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2007-01-12 21:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Neal, I think this can go into the release25-maint branch since it repairs
the ctypes.util.find_library function on BSD systems.  What do you think?

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2007-01-10 12:58

Message:
Logged In: YES 
user_id=1656067
Originator: YES

The output looks good. The patch selects the numerically highest library
version.
NetBSD is not handled by the patch but works through _findLib_gcc (which
will also
be tried as a fallback strategy for Free/Open-BSD when ldconfig output
parsing fails.)

I think the patch is ready for commit.


----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-09 21:01

Message:
Logged In: YES 
user_id=11105
Originator: NO

mkam, I was eventually able to test out your patch.
I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9.
The output from "print find_library('c'), find_library('m')" on these
systems is as follows:

FreeBSD6.0:  libc.so.6, libm.so.4
NetBSD3.0: libc.so.12, libm.so.0
OpenBSD3.9: libc.so.39.0, libm.so.2.1

If you think this is what is expected, I'm happy to apply the patch.  Or
is there further work needed on it?  (Do you still need the output of
"ldconfig -r" or whatever?)

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 19:43

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-12 12:28

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine,
FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions
on how many version numbers are appended to a library name any more. Even
mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a
meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to
different.)
File Added: ctypes-util.py.patch

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-11 11:10

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Hm, I did not know that OpenBSD is still using two version numbers for
shared library.
(I conclude that from the "libc.so.39.0" in the previous followup. Btw
FreeBSD has used
a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout
executable format.)
Unfortunately my freebsd patch has the assumption of a single version
number built in;
more specifically the
  cmp(* map(lambda x: int(x.split('.')[-1]), (a, b)))
is supposed to sort based an the last dot separated field. I guess that
OpenBSD system
does not have another libc, at least none with a minor > 0. ;-)
Thomas, can you mail me the output of "ldconfig -r"? I will refine the
patch then,
doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/
fields. Said output from NetBSD welcome too. DragonflyBSD should be no
problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform
like?

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-08 21:32

Message:
Logged In: YES 
user_id=11105
Originator: NO

I have tested the patch on FreeBSD 6.0 and (after extending the check to
test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works
fine.

find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0
in OpenBSD 3.9, while it returned 'None' before on both machines.

----------------------------------------------------------------------

Comment By: David Remahl (chmod007)
Date: 2006-12-08 08:50

Message:
Logged In: YES 
user_id=2135
Originator: NO

# Does this work (without the gcc fallback) on other *BSD systems too?

I don't know, but it doesn't work on Darwin (which already has a custom
method through macholib).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-07 22:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Will do (although I would appreciate review from others too; I'm not
exactly a BSD expert).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-07 20:15

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thomas, can you take a look?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 21:21:28 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 12 Jan 2007 12:21:28 -0800
Subject: [Patches] [ python-Patches-1610795 ] BSD version of
	ctypes.util.find_library
Message-ID: <E1H5Su0-0003Ju-CF@sc8-sf-web5.sourceforge.net>

Patches item #1610795, was opened at 2006-12-07 14:29
Message generated for change (Comment added) made by theller
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Martin Kammerhofer (mkam)
Assigned to: Neal Norwitz (nnorwitz)
Summary: BSD version of ctypes.util.find_library 

Initial Comment:
The ctypes.util.find_library function for Posix systems is actually
tailored for Linux systems. While the _findlib_gcc function relies
only on the GNU compiler and may therefore work on any system with the
"gcc" command in PATH, the _findLib_ld function relies on the
/sbin/ldconfig command (originating from SunOS 4.0) which is not
standardized. The version from GNU libc differs in option syntax and
output format from other ldconfig programs around.

I therefore provide a patch that enables find_library to properly
communicate with the ldconfig program on FreeBSD systems. It has been
tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD
systems too. (It works without this patch on FreeBSD, because after
getting an error from ldconfig it falls back to _findlib_gcc.)

While at it I also tidied up the Linux specific code: I'm escaping the
function argument before interpolating it into a regular expression (to
protect against nasty regexps) and removed the code for creation of a
temporary file that was not used in any way.


----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2007-01-12 21:21

Message:
Logged In: YES 
user_id=11105
Originator: NO

Committed into trunk as revision 53402.  Thanks for the patch and the work
on it.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-12 21:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Neal, I think this can go into the release25-maint branch since it repairs
the ctypes.util.find_library function on BSD systems.  What do you think?

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2007-01-10 12:58

Message:
Logged In: YES 
user_id=1656067
Originator: YES

The output looks good. The patch selects the numerically highest library
version.
NetBSD is not handled by the patch but works through _findLib_gcc (which
will also
be tried as a fallback strategy for Free/Open-BSD when ldconfig output
parsing fails.)

I think the patch is ready for commit.


----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-09 21:01

Message:
Logged In: YES 
user_id=11105
Originator: NO

mkam, I was eventually able to test out your patch.
I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9.
The output from "print find_library('c'), find_library('m')" on these
systems is as follows:

FreeBSD6.0:  libc.so.6, libm.so.4
NetBSD3.0: libc.so.12, libm.so.0
OpenBSD3.9: libc.so.39.0, libm.so.2.1

If you think this is what is expected, I'm happy to apply the patch.  Or
is there further work needed on it?  (Do you still need the output of
"ldconfig -r" or whatever?)

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 19:43

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-12 12:28

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine,
FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions
on how many version numbers are appended to a library name any more. Even
mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a
meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to
different.)
File Added: ctypes-util.py.patch

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-11 11:10

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Hm, I did not know that OpenBSD is still using two version numbers for
shared library.
(I conclude that from the "libc.so.39.0" in the previous followup. Btw
FreeBSD has used
a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout
executable format.)
Unfortunately my freebsd patch has the assumption of a single version
number built in;
more specifically the
  cmp(* map(lambda x: int(x.split('.')[-1]), (a, b)))
is supposed to sort based an the last dot separated field. I guess that
OpenBSD system
does not have another libc, at least none with a minor > 0. ;-)
Thomas, can you mail me the output of "ldconfig -r"? I will refine the
patch then,
doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/
fields. Said output from NetBSD welcome too. DragonflyBSD should be no
problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform
like?

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-08 21:32

Message:
Logged In: YES 
user_id=11105
Originator: NO

I have tested the patch on FreeBSD 6.0 and (after extending the check to
test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works
fine.

find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0
in OpenBSD 3.9, while it returned 'None' before on both machines.

----------------------------------------------------------------------

Comment By: David Remahl (chmod007)
Date: 2006-12-08 08:50

Message:
Logged In: YES 
user_id=2135
Originator: NO

# Does this work (without the gcc fallback) on other *BSD systems too?

I don't know, but it doesn't work on Darwin (which already has a custom
method through macholib).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-07 22:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Will do (although I would appreciate review from others too; I'm not
exactly a BSD expert).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-07 20:15

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thomas, can you take a look?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

From noreply at sourceforge.net  Fri Jan 12 21:48:54 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 12 Jan 2007 12:48:54 -0800
Subject: [Patches] [ python-Patches-1617699 ] slice-object support for
	ctypes Pointer/Array
Message-ID: <E1H5TKY-0002NK-FV@sc8-sf-web7.sourceforge.net>

Patches item #1617699, was opened at 2006-12-18 05:28
Message generated for change (Comment added) made by theller
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Thomas Heller (theller)
Summary: slice-object support for ctypes Pointer/Array

Initial Comment:
Support for slicing ctypes' Pointer and Array types with slice objects, although only for step=1 case.
(Backported from p3yk-noslice branch.)


----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2007-01-12 21:48

Message:
Logged In: YES 
user_id=11105
Originator: NO

Thomas, a question:  Since steps != 1 are not supported, does this patch
have any value?
IIUC, array[x:y] returns exactly the same as array[x:y:1] for all x and y
values.

Formally, the patch is missing unittests and documentation ;-).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 19:45

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470

From noreply at sourceforge.net  Sat Jan 13 01:03:22 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 12 Jan 2007 16:03:22 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5WMk-0002g1-0G@sc8-sf-web1.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-13 00:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sat Jan 13 03:32:45 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 12 Jan 2007 18:32:45 -0800
Subject: [Patches] [ python-Patches-1634499 ] Py3k: Fix pybench so it runs
Message-ID: <E1H5YhJ-0006cR-JP@sc8-sf-web4.sourceforge.net>

Patches item #1634499, was opened at 2007-01-13 02:32
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tests
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py3k: Fix pybench so it runs

Initial Comment:
This patch fixes pybench so it runs under the current Py3k trunk.  I don't claim to have done the right thing, or even that my patch should be accepted.  I submit it only in the hope that it's useful to somebody.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470

From noreply at sourceforge.net  Sat Jan 13 15:39:48 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 06:39:48 -0800
Subject: [Patches] [ python-Patches-1563842 ] platform.py support for
	IronPython
Message-ID: <E1H5k2u-0008KA-DJ@sc8-sf-web9.sourceforge.net>

Patches item #1563842, was opened at 2006-09-23 03:59
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: platform.py support for IronPython

Initial Comment:
The following patch supplies minimal support for
IronPython in platform.py - it makes the sys.version
parsing not choke and die. There's a bunch of missing
information from IronPython's sys.version string, not
much that can be done there. 

Should platform.py grow an 'implementation' option, so
it can detect whether it's IronPython, CPython, Jython,
or something else?


Patch is against svn trunk.


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-13 15:39

Message:
Logged In: YES 
user_id=38388
Originator: NO

sanxiyn: What do the extra numbers after the 1.0 stand for ? Do those
correspond to branch and revision ?

Armin: I'll add support for sys.version_info and sys.subversion as well.


----------------------------------------------------------------------

Comment By: Seo Sanghyeon (sanxiyn)
Date: 2006-10-10 04:35

Message:
Logged In: YES 
user_id=837148

The current patch doesn't parse sys.version from IronPython
1.0.1.

IronPython 1.0 gives: IronPython 1.0.60816 on .NET 2.0.50727.42

IronPython 1.0.1 gives: IronPython 1.0 (1.0.61005.1977) on
.NET 2.0.50727.42

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-10-10 00:48

Message:
Logged In: YES 
user_id=4771

Python2.5 has grown a sys.subversion attribute:

   ('CPython', 'trunk', '51999')

The first field is intended to describe the exact
implementation of Python.  platform.py could
return this if it is available.  It should also
probably try to use sys.version_info instead of,
or in addition to, using a regexp on sys.version.
One can hope that in the long term the
version_info and the subversion attributes
should eventually be supported by all Python
implementation (PyPy...).

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-09-25 12:30

Message:
Logged In: YES 
user_id=38388

Thanks.

I'll install IronPython and see what else needs to be done.

I've already added a few fixes to make Jython play nice with
platform.py that I'll check in as well.

And yes: I'll add a python_implementation() function that
returns 'CPython', 'Jython' and 'IronPython' as appropriate.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470

From noreply at sourceforge.net  Sat Jan 13 18:39:40 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 09:39:40 -0800
Subject: [Patches] [ python-Patches-1634778 ] Add aliases for latin7/9/10
	charsets
Message-ID: <E1H5mqy-0006wu-KC@sc8-sf-web2.sourceforge.net>

Patches item #1634778, was opened at 2007-01-13 18:39
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Christoph Zwerschke (cito)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add aliases for latin7/9/10 charsets

Initial Comment:
This patch adds the latin-7, latin-9 and latin-10 aliases in some places where they were missing (see http://mail.python.org/pipermail/python-list/2006-December/416921.html).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470

From noreply at sourceforge.net  Sat Jan 13 21:50:28 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 12:50:28 -0800
Subject: [Patches] [ python-Patches-1352731 ] Small upgrades to
	platform.platform()
Message-ID: <E1H5ppc-0005ot-1e@sc8-sf-web11.sourceforge.net>

Patches item #1352731, was opened at 2005-11-10 02:19
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1352731&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: daishi harada (daishiharada)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Small upgrades to platform.platform()

Initial Comment:
This patch updates platform.platform() to recognize
some more Linux distributions. In addition, for RedHat-like
distributions, will use the contents of the /etc/<release-file>
to determine distname.

----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-13 21:50

Message:
Logged In: YES 
user_id=38388
Originator: NO

I'll add a new API linux_distribution() which will provide the more
detailed information and also add support for Rocks (as well as a few
others).


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-11-09 08:49

Message:
Logged In: YES 
user_id=38388

I'm currently working on an updated version of platform.py
that will include part of this patch, patch #1563842 for
IronPython and better support for Jython.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-11-09 06:21

Message:
Logged In: YES 
user_id=21627

Marc-Andre, would you rather accept or reject this patch,
because of the incompatibility?

----------------------------------------------------------------------

Comment By: daishi harada (daishiharada)
Date: 2006-10-10 01:22

Message:
Logged In: YES 
user_id=493197

Thanks for the response.

If by "break" you mean that for redhat-like
distros the output of `python platform.py`
would no longer necessarily be the same after
the patch is applied, yes, that's true.

However, that was the primary motivation for
the patch - the current platform.py wasn't
sufficiently discriminating for my purposes.

In particular, the current platform.py ignores
the first "field" of the contents of
/etc/redhat-release, which I believe for
ROCKS was the only portion which was changed
from the redhat version on which it was based.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-10-09 20:31

Message:
Logged In: YES 
user_id=38388

Sorry for the late reply. I must have missed the initial SF
mail.

I've had a look at the patch, but I'm not sure whether it
can be accepted: wouldn't it break already recognized
RedHat-like platforms ?


----------------------------------------------------------------------

Comment By: daishi harada (daishiharada)
Date: 2005-11-10 02:23

Message:
Logged In: YES 
user_id=493197

assigning to lemberg as suggested in the file.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1352731&group_id=5470

From noreply at sourceforge.net  Sat Jan 13 22:14:31 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 13:14:31 -0800
Subject: [Patches] [ python-Patches-675976 ] mhlib does not obey MHCONTEXT
	env var
Message-ID: <E1H5qCt-0006TZ-E4@sc8-sf-web9.sourceforge.net>

Patches item #675976, was opened at 2003-01-28 10:16
Message generated for change (Comment added) made by sjoerd
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675976&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Sjoerd Mullender (sjoerd)
Assigned to: Nobody/Anonymous (nobody)
Summary: mhlib does not obey MHCONTEXT env var

Initial Comment:
All programs in the (N)MH suite of programs use the
MHCONTEXT environment variable to find the so-called
context file where the current folder is remembered.
mhlib should do the same, so that it can be used in
combination with the standard (N)MH programs.

Also, when writing the context file, mhlib should
replace the Current-Folder line but keep the other
lines in tact.

The attached patch fixes both problems. It introduces a
new method for the class MH called getcontextfile which
uses the MHCONTEXT environment variable to find the
context file, and it uses the already existing function
updateline to update the context file.

Some questions concerning this patch:
- should I document the new method or should it be an
internal method only?
- should the fix be ported to older Python versions?
With the patch it does behave differently if you have
an MHCONTEXT environment variable.

----------------------------------------------------------------------

>Comment By: Sjoerd Mullender (sjoerd)
Date: 2007-01-13 22:14

Message:
Logged In: YES 
user_id=43607
Originator: YES

I have added a line to the docstring and I have added a method description
to the library reference.  Other than those changes, the new patch is
identical to the old.

I can check this in if you want.
File Added: mhlib.patch

----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2006-12-22 19:00

Message:
Logged In: YES 
user_id=11375
Originator: NO

The patch looks OK.
  
Regarding your questions: 1) I think the method should be documented; it
might be useful to subclasses of MH.  2) New feature, so 2.6 only.


----------------------------------------------------------------------

Comment By: Sjoerd Mullender (sjoerd)
Date: 2003-02-13 10:48

Message:
Logged In: YES 
user_id=43607

I can assure you that I did check that checkmark.  Maybe
it's my browser in combination with SF.  We'll see if it
works this time.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-02-13 04:02

Message:
Logged In: YES 
user_id=33168

There's no uploaded file!  You have to check the
checkbox labeled &quot;Check to Upload &amp; Attach File&quot;
when you upload a file.

Please try again.

(This is a SourceForge annoyance that we can do
nothing about. :-( )

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675976&group_id=5470

From noreply at sourceforge.net  Sat Jan 13 23:33:46 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 14:33:46 -0800
Subject: [Patches] [ python-Patches-1563842 ] platform.py support for
	IronPython
Message-ID: <E1H5rRa-0002fX-Pq@sc8-sf-web5.sourceforge.net>

Patches item #1563842, was opened at 2006-09-23 03:59
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: platform.py support for IronPython

Initial Comment:
The following patch supplies minimal support for
IronPython in platform.py - it makes the sys.version
parsing not choke and die. There's a bunch of missing
information from IronPython's sys.version string, not
much that can be done there. 

Should platform.py grow an 'implementation' option, so
it can detect whether it's IronPython, CPython, Jython,
or something else?


Patch is against svn trunk.


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-13 23:33

Message:
Logged In: YES 
user_id=38388
Originator: NO

Checked in a version that also supports IronPython, including the 1.0.1
version.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-13 15:39

Message:
Logged In: YES 
user_id=38388
Originator: NO

sanxiyn: What do the extra numbers after the 1.0 stand for ? Do those
correspond to branch and revision ?

Armin: I'll add support for sys.version_info and sys.subversion as well.


----------------------------------------------------------------------

Comment By: Seo Sanghyeon (sanxiyn)
Date: 2006-10-10 04:35

Message:
Logged In: YES 
user_id=837148

The current patch doesn't parse sys.version from IronPython
1.0.1.

IronPython 1.0 gives: IronPython 1.0.60816 on .NET 2.0.50727.42

IronPython 1.0.1 gives: IronPython 1.0 (1.0.61005.1977) on
.NET 2.0.50727.42

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-10-10 00:48

Message:
Logged In: YES 
user_id=4771

Python2.5 has grown a sys.subversion attribute:

   ('CPython', 'trunk', '51999')

The first field is intended to describe the exact
implementation of Python.  platform.py could
return this if it is available.  It should also
probably try to use sys.version_info instead of,
or in addition to, using a regexp on sys.version.
One can hope that in the long term the
version_info and the subversion attributes
should eventually be supported by all Python
implementation (PyPy...).

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-09-25 12:30

Message:
Logged In: YES 
user_id=38388

Thanks.

I'll install IronPython and see what else needs to be done.

I've already added a few fixes to make Jython play nice with
platform.py that I'll check in as well.

And yes: I'll add a python_implementation() function that
returns 'CPython', 'Jython' and 'IronPython' as appropriate.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 00:21:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 15:21:11 -0800
Subject: [Patches] [ python-Patches-1620174 ] Improve platform.py usability
	on Windows
Message-ID: <E1H5sBT-0008JL-32@sc8-sf-web1.sourceforge.net>

Patches item #1620174, was opened at 2006-12-21 15:49
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Luke Dunstan (infidel)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Improve platform.py usability on Windows

Initial Comment:
This patch modifies platform.py to remove most of the dependencies on pywin32, and use the standard ctypes and _winreg modules instead. It also adds support for Windows CE.


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-14 00:21

Message:
Logged In: YES 
user_id=38388
Originator: NO

platform.py is used outside the Python distribution to check which Python
version is being used (among other things).

It has to run with Python versions as early as 1.5.2.

That said, it's OK to have it use different ways of accessing the needed
information, provided that the signatures and return values of the public
APIs don't change.


----------------------------------------------------------------------

Comment By: Luke Dunstan (infidel)
Date: 2007-01-01 07:25

Message:
Logged In: YES 
user_id=30442
Originator: YES

Why does platform.py need to be compatible with earlier versions of
Python?

The return types haven't changed, and I think the return values won't
change because the same OS APIs are being used.


----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2006-12-31 19:49

Message:
Logged In: YES 
user_id=38388
Originator: NO

I haven't looked at the patch yet, so just a few general comments on
changes to platform.py:

* the code must continue to work with Python versions prior to 2.6

  This means that ctypes and _winreg support may be added as an option,
but 
  removing pywin32 calls is not the right way to proceed.

* changes in return type of the public and documented APIs are not
possible

  If you have a need for more information, then a new API should be
added,
  or the information merged into one of the existing return fields.

* changes in the return values of APIs due to use of different OS APIs
must
  be avoided

  There's code out there relying on the return values, so if in doubt a
new
  API must be provided.


----------------------------------------------------------------------

Comment By: Luke Dunstan (infidel)
Date: 2006-12-31 06:57

Message:
Logged In: YES 
user_id=30442
Originator: YES

1. Yes this is intended for 2.6

2. The only difference between win32api.RegQueryValueEx and
_winreg.QueryValueEx seems to be that the latter returns Unicode strings. I
have adjusted the patch to be more compatible with the old behaviour.

3. I have updated the doc string in the new patch.

File Added: platform-wince-2.diff

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-31 01:13

Message:
Logged In: YES 
user_id=764593
Originator: NO


( win32api.RegQueryValueEx is _winreg.QueryValueEx ) ?

If not, it should wait for 2.6, and there should be an entry in what's
new.  (I suppose similar concerns exist for other return classes.)

The change to win32_ver only half-corrects the return type to the
four-tuple.  The meaning of release (even if it is just "release name")
should be specified in the text.


 def win32_ver(release='',version='',csd='',ptype=''):
 
     """ Get additional version information from the Windows Registry
-        and return a tuple (version,csd,ptype) referring to version
+        and return a tuple (release,version,csd,ptype) referring to
version
         number, CSD level and OS type (multi/single
         processor).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 01:27:56 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 16:27:56 -0800
Subject: [Patches] [ python-Patches-1619846 ] Bug fixes for int unification
	branch
Message-ID: <E1H5tE4-0005b1-0q@sc8-sf-web7.sourceforge.net>

Patches item #1619846, was opened at 2006-12-20 21:36
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: Python 3000
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Adam Olsen (rhamphoryncus)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Bug fixes for int unification branch

Initial Comment:
This patch should fix all the real bugs in the int unification branch.  All the remaining bugs are either external to the branch or due to tests that need updating (mostly due to the names of int vs long).

External bugs:
    test_socket: http://sourceforge.net/tracker/index.php?func=detail&aid=1619659&group_id=5470&atid=105470
    test_class: seems to be caused by using new-style classes by default.  Unrelated to int-unification.
    test_set: inheritance of __hash__.  I believe this was fixed in p3yk already.
Test failures due to naming differences:
    test_ctypes test_doctest test_generators test_genexps test_optparse test_pyexpat 
Tests needing updating, not just due to name differences:
    test_descr test_pickletools 


The following aspects need specific review:

PyLong_FromVoidPtr was doing the cast wrong.  GCC was compiling the (unsigned Py_LONG_LONG)p cast in such a way as to produce a value larger than 2**32, obviously wrong on this 32bit box, and it warned about the cast too.  Making it cast to Py_uintptr_t first seems to have corrected both the behaviour and the warning, but may be wrong on other architectures.

Many of my changes to use PyInt_CheckExact may be better served by creating a PyInt_CheckSmall macro that retains the range check but allows subclasses.  Alternatively, the index interface could be used, but would require more rewriting perhaps best left until later.

There are some areas that handled signed vs unsigned and int vs long a bit differently, and they may still need work.  Hard to tell what behaviour is correct in such cases.


Skipped files:
Doc/ext/run-func.c
Mac/Modules/ctl/_Ctlmodule.c
Mac/Modules/dlg/_Dlgmodule.c
Mac/Modules/win/_Winmodule.c
Mac/Modules/pycfbridge.c
Modules/carbonevt/_CarbonEvtmodule.c
Modules/_sqlite/connection.c
Modules/almodule.c
Modules/cgensupport.c
Modules/clmodule.c
Modules/flmodule.c
Modules/grpmodule.c
Modules/posicmodule.c:conv_confname
Modules/pyexpat.c
Modules/svmodule.c
Modules/termios.c
Modules/_bsddb.c
Modules/_sqlite/statement.c
PC/_winreg.c
Python/dynload_beos.c
Python/mactoolboxglue.c
Python/marshal.c
Python/pythonrun.c:handle_system_exit
RISCOS/Modules/drawfmodule.c
RISCOS/Modules/swimodule.c

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 19:27

Message:
Logged In: YES 
user_id=6380
Originator: NO

For now I'm just going to submit this; then I'll think about the
implications later.  My highest priority is to get this merged back into
the p3yk branch, although I have no idea how to do that yet...

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 19:05

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'll be taking over this branch.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-21 16:22

Message:
Logged In: YES 
user_id=21627
Originator: NO

Not this year anymore. I'll try to early next year (hopefully first week
of January).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-21 16:12

Message:
Logged In: YES 
user_id=6380
Originator: NO

Martin, do you have time to look at this?  I'll play with it too but I'd
like to have your opinion.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 00:54:57 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 15:54:57 -0800
Subject: [Patches] [ python-Patches-1634499 ] Py3k: Fix pybench so it runs
Message-ID: <E1H5si9-0001bv-0t@sc8-sf-web3.sourceforge.net>

Patches item #1634499, was opened at 2007-01-12 21:32
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tests
Group: Python 3000
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py3k: Fix pybench so it runs

Initial Comment:
This patch fixes pybench so it runs under the current Py3k trunk.  I don't claim to have done the right thing, or even that my patch should be accepted.  I submit it only in the hope that it's useful to somebody.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 18:54

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks, applied!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 01:05:09 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 16:05:09 -0800
Subject: [Patches] [ python-Patches-1619846 ] Bug fixes for int unification
	branch
Message-ID: <E1H5ss1-00076u-NQ@sc8-sf-web2.sourceforge.net>

Patches item #1619846, was opened at 2006-12-20 21:36
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
>Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Adam Olsen (rhamphoryncus)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: Bug fixes for int unification branch

Initial Comment:
This patch should fix all the real bugs in the int unification branch.  All the remaining bugs are either external to the branch or due to tests that need updating (mostly due to the names of int vs long).

External bugs:
    test_socket: http://sourceforge.net/tracker/index.php?func=detail&aid=1619659&group_id=5470&atid=105470
    test_class: seems to be caused by using new-style classes by default.  Unrelated to int-unification.
    test_set: inheritance of __hash__.  I believe this was fixed in p3yk already.
Test failures due to naming differences:
    test_ctypes test_doctest test_generators test_genexps test_optparse test_pyexpat 
Tests needing updating, not just due to name differences:
    test_descr test_pickletools 


The following aspects need specific review:

PyLong_FromVoidPtr was doing the cast wrong.  GCC was compiling the (unsigned Py_LONG_LONG)p cast in such a way as to produce a value larger than 2**32, obviously wrong on this 32bit box, and it warned about the cast too.  Making it cast to Py_uintptr_t first seems to have corrected both the behaviour and the warning, but may be wrong on other architectures.

Many of my changes to use PyInt_CheckExact may be better served by creating a PyInt_CheckSmall macro that retains the range check but allows subclasses.  Alternatively, the index interface could be used, but would require more rewriting perhaps best left until later.

There are some areas that handled signed vs unsigned and int vs long a bit differently, and they may still need work.  Hard to tell what behaviour is correct in such cases.


Skipped files:
Doc/ext/run-func.c
Mac/Modules/ctl/_Ctlmodule.c
Mac/Modules/dlg/_Dlgmodule.c
Mac/Modules/win/_Winmodule.c
Mac/Modules/pycfbridge.c
Modules/carbonevt/_CarbonEvtmodule.c
Modules/_sqlite/connection.c
Modules/almodule.c
Modules/cgensupport.c
Modules/clmodule.c
Modules/flmodule.c
Modules/grpmodule.c
Modules/posicmodule.c:conv_confname
Modules/pyexpat.c
Modules/svmodule.c
Modules/termios.c
Modules/_bsddb.c
Modules/_sqlite/statement.c
PC/_winreg.c
Python/dynload_beos.c
Python/mactoolboxglue.c
Python/marshal.c
Python/pythonrun.c:handle_system_exit
RISCOS/Modules/drawfmodule.c
RISCOS/Modules/swimodule.c

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 19:05

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'll be taking over this branch.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-21 16:22

Message:
Logged In: YES 
user_id=21627
Originator: NO

Not this year anymore. I'll try to early next year (hopefully first week
of January).

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-21 16:12

Message:
Logged In: YES 
user_id=6380
Originator: NO

Martin, do you have time to look at this?  I'll play with it too but I'd
like to have your opinion.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 08:35:50 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 23:35:50 -0800
Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative
	ints
Message-ID: <E1H5zuA-0004VK-2O@sc8-sf-web1.sourceforge.net>

Patches item #1635058, was opened at 2007-01-14 01:35
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: htonl et al accept negative ints

Initial Comment:
Referencing bug 1619659

This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation.

I wrote a test case to ensure things work as expected, and ensured all tests pass.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 00:59:51 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 15:59:51 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H5smt-0006Xq-2M@sc8-sf-web1.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 04:37
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 18:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 19:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 12:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 01:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 23:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 23:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 22:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 21:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 21:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 15:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 15:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 13:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 20:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 20:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 13:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 05:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 00:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 11:42:56 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 02:42:56 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H62pE-0006B6-88@sc8-sf-web4.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
>Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 10:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 23:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-13 00:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 12:44:50 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 03:44:50 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H63n8-0003fy-If@sc8-sf-web1.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 11:44

Message:
Logged In: YES 
user_id=364875
Originator: YES

Here's another possible fix for the worst-case scenario:

#define MAX_SLICE_DELTA (64*1024)
if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) 
    || (size_of_slice > (size_of_original / 2))  )
    use_lazy_slice();
else
    create_string_as_normal();

You'd still get the full benefit of lazy slices most of the time, but it
takes the edge off the really pathological cases.

How's that?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 10:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 23:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-13 00:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 18:04:58 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 09:04:58 -0800
Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative
	ints
Message-ID: <E1H68mw-0005dH-O0@sc8-sf-web7.sourceforge.net>

Patches item #1635058, was opened at 2007-01-14 02:35
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: htonl et al accept negative ints

Initial Comment:
Referencing bug 1619659

This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation.

I wrote a test case to ensure things work as expected, and ensured all tests pass.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 12:04

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks, submitted.

(Note that I had to fix the indentation in your patch; you used four
spaces where the original code used tabs.  Please be consistent!)

Can you check if there's a need to update the docs?  If there is, send me
a doc patch and I'll apply it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 21:31:32 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 12:31:32 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H6C0q-0000sk-5E@sc8-sf-web8.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 20:53
Message generated for change (Comment added) made by tonylownds
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Tony Lownds (tonylownds)
Date: 2007-01-14 20:31

Message:
Logged In: YES 
user_id=24100
Originator: YES

Combines the code paths for MAKE_FUNCTION and MAKE_CLOSURE. 
Fixes a crash where functions with closures and either annotations or 
keyword-only arguments result in MAKE_CLOSURE, but only 
MAKE_FUNCTION has the code to handle annotations or keyword-only
arguments.

Includes enough tests to trigger the bug.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-06 21:03

Message:
Logged In: YES 
user_id=24100
Originator: YES

I tried to implement getargspec() as described, and unfortunately there 
is another wrinkle to consider. Keyword-only arguments may or may not 
have defaults. So the invariant described in getargspec()'s docstring
can't 
be maintained when simply appending keyword-only arguments.

    A tuple of four things is returned: (args, varargs, varkw, defaults).
    'args' is a list of the argument names (it may contain nested lists).
    'args' will include keyword-only argument names. 
    'varargs' and 'varkw' are the names of the * and ** arguments or
None.
    'defaults' is an n-tuple of the default values of the last n
arguments.

The attached patch adds an 'getfullargspec' API that returns complete 
information; 'getargspec' raises an error if information would be lost;
the order 
of arguments in 'formatargspec' is backwards compatible, so that
formatargspec(*getargspec(f)) == formatargspec(*getfullargspec(f)) when
getargspec(f) does not raise an error.

PEP 362 could and probably should replace the new getfullargspec()
function,
so I did not implement an API more complicated than a tuple.

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-06 20:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Change peepholer to not bail in the presence of EXTENDED_ARG +
MAKE_FUNCTION.
Enforce the natural 16-bit limit of annotations in compile.c.

File Added: peepholer_and_max_annotations.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 17:53

Message:
Logged In: YES 
user_id=6380
Originator: NO

I like the following approach: (1) the old API continues to work for all
functions, but provides incomplete information (not losing the kw-only
args completely, but losing the fact that they are kw-only); (2) add a new
API that provides all the relevant information.

Maybe the new API should not return a 7-tuple but rather a structure with
named attributes; that makes it more future-proof.

Sorry, I don't have any good suggestions for new names.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 07:12

Message:
Logged In: YES 
user_id=24100
Originator: YES

For getargs and getargvalues, including the names in positional args is an
excellent strategy.
There are uses (in cgitb) in the stdlib for getargvalues that then
wouldn't need to be changed. 

The 2 uses of getargspec in the stdlib (one of which I missed, in
DocXMLRPCServer) are both 
closely followed by formatargspec. I think those APIs should change or
information will be lost. 

Alternatively, a new function (hopefully with a better name than
getfullargspec :) could be 
made and getargspec could retain its API, but raise an error when
keyword-only arguments are 
present.

def getargspec(func):
  args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann =
getfullargspec()
  if kwonlyargs:
     raise ValueError, "function has keyword-only arguments, use
getfullargspec!"
  return args, varargs, varkw, defaults

I'll update the patch to fix getargvalues and DocXMLRPCServer this
weekend.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 05:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Well, it depends on the context whether that matters.  The kw-only args
could just be included in the positional args (which have names anyway)
and that wouldn't be so bad for some apps.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 05:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 06:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 03:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 01:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered
with reviewing the ast stuff or the compiler package since I don't know
enough about these, but given that it compiles things correctly I'm not so
worried about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 01:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 00:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 19:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 15:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 21:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 20:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 20:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 22:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 18:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 09:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should
probably be a test with nested functions even though it really shouldn't
be different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 08:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like
the names tname and vname.  Also they seem inconsistent.  Aren't all the
other names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-20 01:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the
patch from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to
be passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-04 01:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 21:32:14 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 12:32:14 -0800
Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax
Message-ID: <E1H6C1W-0002Hi-3J@sc8-sf-web3.sourceforge.net>

Patches item #1607548, was opened at 2006-12-02 20:53
Message generated for change (Comment added) made by tonylownds
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Tony Lownds (tonylownds)
Assigned to: Guido van Rossum (gvanrossum)
Summary: Optional Argument Syntax

Initial Comment:
This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP.

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
  suite

The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower.


----------------------------------------------------------------------

>Comment By: Tony Lownds (tonylownds)
Date: 2007-01-14 20:32

Message:
Logged In: YES 
user_id=24100
Originator: YES

File Added: make_closure_fix.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-14 20:31

Message:
Logged In: YES 
user_id=24100
Originator: YES

Combines the code paths for MAKE_FUNCTION and MAKE_CLOSURE. 
Fixes a crash where functions with closures and either annotations or 
keyword-only arguments result in MAKE_CLOSURE, but only 
MAKE_FUNCTION has the code to handle annotations or keyword-only
arguments.

Includes enough tests to trigger the bug.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-06 21:03

Message:
Logged In: YES 
user_id=24100
Originator: YES

I tried to implement getargspec() as described, and unfortunately there 
is another wrinkle to consider. Keyword-only arguments may or may not 
have defaults. So the invariant described in getargspec()'s docstring
can't 
be maintained when simply appending keyword-only arguments.

    A tuple of four things is returned: (args, varargs, varkw, defaults).
    'args' is a list of the argument names (it may contain nested lists).
    'args' will include keyword-only argument names. 
    'varargs' and 'varkw' are the names of the * and ** arguments or
None.
    'defaults' is an n-tuple of the default values of the last n
arguments.

The attached patch adds an 'getfullargspec' API that returns complete 
information; 'getargspec' raises an error if information would be lost;
the order 
of arguments in 'formatargspec' is backwards compatible, so that
formatargspec(*getargspec(f)) == formatargspec(*getfullargspec(f)) when
getargspec(f) does not raise an error.

PEP 362 could and probably should replace the new getfullargspec()
function,
so I did not implement an API more complicated than a tuple.

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-06 20:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Change peepholer to not bail in the presence of EXTENDED_ARG +
MAKE_FUNCTION.
Enforce the natural 16-bit limit of annotations in compile.c.

File Added: peepholer_and_max_annotations.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 17:53

Message:
Logged In: YES 
user_id=6380
Originator: NO

I like the following approach: (1) the old API continues to work for all
functions, but provides incomplete information (not losing the kw-only args
completely, but losing the fact that they are kw-only); (2) add a new API
that provides all the relevant information.

Maybe the new API should not return a 7-tuple but rather a structure with
named attributes; that makes it more future-proof.

Sorry, I don't have any good suggestions for new names.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 07:12

Message:
Logged In: YES 
user_id=24100
Originator: YES

For getargs and getargvalues, including the names in positional args is an
excellent strategy.
There are uses (in cgitb) in the stdlib for getargvalues that then
wouldn't need to be changed. 

The 2 uses of getargspec in the stdlib (one of which I missed, in
DocXMLRPCServer) are both 
closely followed by formatargspec. I think those APIs should change or
information will be lost. 

Alternatively, a new function (hopefully with a better name than
getfullargspec :) could be 
made and getargspec could retain its API, but raise an error when
keyword-only arguments are 
present.

def getargspec(func):
  args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann =
getfullargspec()
  if kwonlyargs:
     raise ValueError, "function has keyword-only arguments, use
getfullargspec!"
  return args, varargs, varkw, defaults

I'll update the patch to fix getargvalues and DocXMLRPCServer this
weekend.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 05:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Well, it depends on the context whether that matters.  The kw-only args
could just be included in the positional args (which have names anyway) and
that wouldn't be so bad for some apps.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2007-01-04 05:17

Message:
Logged In: YES 
user_id=24100
Originator: YES

I think everyone should update have to update their uses of getargspec and
friends, because otherwise they will silently mis-handle keyword-only
arguments.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-04 04:30

Message:
Logged In: YES 
user_id=6380
Originator: NO

I'm not sure it's right to just change the signature of the various
functions in inspect.py; that would break all existing code using that
module (and there definitely are other users besides pydoc).  It would be
better to add new methods that provide access to the additional
functionality.  Or do you think that everyone will have to change their
code anyway?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-28 06:53

Message:
Logged In: YES 
user_id=33168
Originator: NO

I'm skipping the pydoc patch.  Didn't even look at it.  I don't have the
refleak, but I changed some calls and may have fixed it.

Committed revision 53170.

Leaving open to deal with the pydoc patch.

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 03:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

Nothing else on the C side of things. The pydoc patch works well for me;
more tests ought to be added for function annotations and also for
keyword-only arguments, but perhaps that can be added on as a later patch
after checkin.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 01:38

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks!  Is there anything else that you think needs to be done before I
check this in?  The core code looks alright to me; I can't be bothered with
reviewing the ast stuff or the compiler package since I don't know enough
about these, but given that it compiles things correctly I'm not so worried
about those.

What's the status of the pydoc patch? Are you still working on that?


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-28 01:28

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed in latest patch. Also added VISIT call for func_annotations.
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-28 00:40

Message:
Logged In: YES 
user_id=6380
Originator: NO

I believe I've found a leak in the code that adds annotations to a
function object. See this session:

>>> x = object()
>>> import sys
>>> sys.getrefcount(x)
2
>>> for i in range(100):
...  def f(x: x): pass
...
>>> del f
>>> sys.getrefcount(x)
102
>>>

At first I thought this could be due to the code added to the
MAKE_FUNCTION opcode, but I don't see a leak there. More likely
func_annotations is not being freed when a function object is deleted.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 19:05

Message:
Logged In: YES 
user_id=24100
Originator: YES

Initial patch to implement keyword-only arguments and annotations support
for pydoc and inspect.
Tests do not exercise these features, yet.

Output for annotations that are types is special cased so that for:

def intmin(*a: int) -> int: pass

...help(intmin) will display:

intmin(*a: int) -> int

File Added: pydoc.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-23 15:53

Message:
Logged In: YES 
user_id=24100
Originator: YES

Fixed the non-C89 style lines and the formatting (hopefully in compatible
style :)
File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-22 21:41

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks for the progress!  There are still a few lines ending in whitespace
or lines that are longer than 80 chars (and weren't before).  Mind cleaning
those up?

Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't
compile (the 'int' declarations ought to be moved to the start of the
containing {...} block); I think this style is not C89 compatible.


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-22 20:15

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Fix crasher in Python/symtable.c -- annotations were visited inside the
function scope
2. Fix Lib/compiler issues with Lib/test/test_complex_args. 

Output from Lib/compiler does not pass all tests, same failures as in HEAD
of p3yk branch.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-21 20:21

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Address Neal's comments (I hope)
2. test_scope passes
3. Added some additional tests to test_compiler

Open implementation issues:
1. Output from Lib/compiler does not pass test_complex_args, test_scope,
possibly more.

File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 22:13

Message:
Logged In: YES 
user_id=24100
Originator: YES

Changes:
1. Updated to apply cleanly
2. Fix to compile.c so that test_complex_args passes

Open implementation issues:
1. Neal's comments
2. test_scope fails
3. Output from Lib/compiler does not pass test_complex_args


File Added: opt_arg_ann.patch

----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-20 18:04

Message:
Logged In: YES 
user_id=24100
Originator: YES

I'll work on code formatting and the error checking and other cleanup.
Open to other names than tname and
vname, I created those non-terminals in order to use the same code for
processing "def" and "lambda". Terminals 
are caps IIUC. 

I did add a test for the multi-paren situation. 2.5 had that bug too.

Re: no changes to ceval, I tried generating the func_annotations
dictionary using 
bytecodes. That doesn't change the ceval loop but was more code and was
slower. 
So there is a way to avoid ceval changes.

Re: deciding if lambda was going to require parens around the arguments,
I don't think there was any decision, and yes annotations would be easily
supportable.
Happy to change if there is support, it's backwards incompatible.

Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as')
on Guido's blog.

Thanks for the comments!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 09:25

Message:
Logged In: YES 
user_id=33168
Originator: NO

Nix this comment:  I would definitely prefer the annotations baked into
the code object so
there are no changes to ceval.  I see that Guido wants it the way it
currently is which makes sense for nested functions.  There should probably
be a test with nested functions even though it really shouldn't be
different.  The test will verify that.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-12-20 08:38

Message:
Logged In: YES 
user_id=33168
Originator: NO

When regenerating the patch, can you also remove non-functional changes
such as removing unneeded parens and whitespace changes.  Also, please try
to keep the same formatting in the file wrt tabs and spaces and don't move
code around.  I know this is a pain and inconsistent.  I think I changed
ast.c to be all 4 space indents with spaces only.

In compiler_simple_arg(), don't you need to check if annotation is NULL
when returned from ast_for_expr?  Otherwise an undetected error would go
through, wouldn't it?

In compiler_complex_args(), don't you need to set the ast_error (or a
SystemError) if the switch isn't a tname, vname, or LPAR?  I don't like the
names tname and vname.  Also they seem inconsistent.  Aren't all the other
names all CAPS?

In hunk, @@ -602,51 +625,75 @@ remove the commented out code.  We
shouldn't use any // style comments either.
Can you improve the error msg for kwdefaults == NULL?  (Thanks for adding
it!)
Check annotation for NULL if returned from ast_for_expr?

BTW, the AST code in this area was tricky code which had some bugs.  Did
you test with adding extra parentheses and singleton tuples?

I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return
type.

In symtable.c remove the printfs.  They should probably be SystemErrors or
something.

I would definitely prefer the annotations baked into the code object so
there are no changes to ceval.

Did we decide if lambda was going to require parens around the arguments? 
If so, it could support annotations, right?  (No comment on the usefulness
of annotations for lambdas. :-)

In compiler_visit_argannotation, you should return the result from
PyList_Append and can remove the comment about checking for errors.  Also,
I believe the INCREF is not needed, it will be done by PyList_Append.
Same deal with returning result of compiler_visit_argannotations() (the
one with an s).

Need to check for PyList_New() returning NULL in
compiler_visit_annotations().
Lots more error checking needs to be added in this area.

Dammit, I really want to use Mondrian for these comments!  (Sorry Tony,
not your fault, I'm just having some bad memories at this point cause I
have to keep providing the references.)

This patch looks very complete in that it updates things like the compiler
package and the parsermodule.c.  Good job!  This is a great start.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2006-12-20 01:22

Message:
Logged In: YES 
user_id=6380
Originator: NO

Applying the patch fails, probably due to recent merge activities in the
p3yk branch. Can I inconvenience you with a request to regenerate the patch
from the branch head?

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-12-11 17:29

Message:
Logged In: YES 
user_id=764593
Originator: NO

Could you rename it to "argument annotations"?  "optional argument" makes
me think of the current keyword arguments, that can be but don't have to be
passed.

-jJ


----------------------------------------------------------------------

Comment By: Tony Lownds (tonylownds)
Date: 2006-12-04 01:24

Message:
Logged In: YES 
user_id=24100
Originator: YES

This patch implements optional argument syntax for Python 3000. The patch
still has issues:
1. test_ast and test_scope fail.
2. Running the test suite after compiling the library with the compiler
package causes failures
3. no docs
4. C-code reference counts and error checking needs a review

The syntax implemented is roughly:

def f(arg:expr, (nested1:expr, nested2:expr)) -> expr:
suite

The function object has a new attribute, func_annotations that maps from
argument names to the result of the expression. The return annotation is
stored with a key of 'return'.

Lambda's syntax doesn't support annotations.

The ast format has changed for the builtin compiler and the compiler
package. A new token was added, '->' (called RARROW in token.h). token.py
lost ERRORTOKEN after re-generating, I don't know why. I added it back
manually.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 22:57:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 13:57:30 -0800
Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile
	patch
Message-ID: <E1H6DM2-0005zo-5q@sc8-sf-web6.sourceforge.net>

Patches item #1598415, was opened at 2006-11-17 15:44
Message generated for change (Comment added) made by vsajip
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.5
>Status: Pending
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: chads (cjschr)
Assigned to: Vinay Sajip (vsajip)
Summary: Logging Module - followfile patch

Initial Comment:
Pertaining to the FileHandler and the file being written to:

It's possible that the file being written to will be
rolled-over by an external application such as newsyslog.
By default, FileHandler tracks the file descriptor, 
not the file.  If the original file is renamed, the file 
descriptor is still updated; however, it's probably 
desired that continued updates to the original file take 
place instead.

This patch adds an attribute to the FileHandler class
constructor (and basicConfig kw as well).  If the 
attribute evaluates to True, the filename, not the 
descriptor is tracked.  Basically, the code compares the 
file status from a previous emit call to the current call
before the base class emit is called.  If a difference in
st_ino or st_dev is found, the current stream is 
flush/closed and a new one, based on baseFilename, is
created, file status is updated, and then the base class 
emit is called.


----------------------------------------------------------------------

>Comment By: Vinay Sajip (vsajip)
Date: 2007-01-14 21:57

Message:
Logged In: YES 
user_id=308438
Originator: NO

WatchedFileHandler added to logging.handlers, checked into trunk.
Documentation updated, too.

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-11 21:50

Message:
Logged In: YES 
user_id=308438
Originator: NO

I've had a bit more of a think about this, and realised that I made a
boo-boo in one of my earlier comments. Under Windows, log files are opened
with exclusive locks, so that other processes cannot rename or move files
which are open. So I believe the approach won't work at all under Windows.
(Chad, sorry about making you redo the patch with ST_SIZE rather than
ST_DEV and ST_INO).

I also think this is a less common use case than warrants supporting it at
the basicConfig() level, which is for really very basic usage
configuration. So I would advocate adding a WatchedFileHandler (in
logging.handlers) which watches st_dev and st_ino (as per Chad's original
patch) and closes the old file descriptor and reopens the file when a
change is seen. Some recent changes checked into SVN trunk facilitate the
reopening - I've added an _open() method to FileHandler to do this.

Chad, what do you think of this approach?

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 17:06

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Uploaded the wrong diff.  This is the correct one.

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 17:02

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Updated per vsajip to work on Windoze too.  The code now
checks for a current size < previous size (based on ST_SIZE).

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2006-11-19 20:32

Message:
Logged In: YES 
user_id=308438
Originator: NO

This patch, relying as it does on Unix-specific details such as i-nodes,
does not appear as if it will work under Windows. For that reason I will
mark it as Pending and Invalid for now, if cjschr can update this tracker
item with how the patch will work on Windows, I will look at it further.
The SF system will automatically close it if no update is made to the item
in approx. 2 weeks, though it can still be reopened after that.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-11-18 19:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

Assigning to Vinay.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 23:24:05 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 14:24:05 -0800
Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative
	ints
Message-ID: <E1H6Dll-0002Hy-8b@sc8-sf-web2.sourceforge.net>

Patches item #1635058, was opened at 2007-01-14 01:35
Message generated for change (Comment added) made by mark-roberts
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Closed
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: htonl et al accept negative ints

Initial Comment:
Referencing bug 1619659

This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation.

I wrote a test case to ensure things work as expected, and ensured all tests pass.


----------------------------------------------------------------------

>Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-14 16:24

Message:
Logged In: YES 
user_id=1591633
Originator: YES

Hmm, I'll remember consistency when working with the C implementation. 
The Python that I've looked at seems to always use 4 spaces.  At any rate,
here's a doc patch.  It essentially just makes "n bit integers" read "n bit
positive integers".  Other than that, I can think of no way to update the
docs to reflect the scope of this patch.

Thanks for everything, Guido!
File Added: bug_g119659_doc.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 11:04

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks, submitted.

(Note that I had to fix the indentation in your patch; you used four
spaces where the original code used tabs.  Please be consistent!)

Can you check if there's a need to update the docs?  If there is, send me
a doc patch and I'll apply it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 01:02:49 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 16:02:49 -0800
Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative
	ints
Message-ID: <E1H6FJJ-0001e5-Mb@sc8-sf-web4.sourceforge.net>

Patches item #1635058, was opened at 2007-01-14 02:35
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Closed
Resolution: Accepted
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: htonl et al accept negative ints

Initial Comment:
Referencing bug 1619659

This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation.

I wrote a test case to ensure things work as expected, and ensured all tests pass.


----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 19:02

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks, applied!

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-14 17:24

Message:
Logged In: YES 
user_id=1591633
Originator: YES

Hmm, I'll remember consistency when working with the C implementation. 
The Python that I've looked at seems to always use 4 spaces.  At any rate,
here's a doc patch.  It essentially just makes "n bit integers" read "n bit
positive integers".  Other than that, I can think of no way to update the
docs to reflect the scope of this patch.

Thanks for everything, Guido!
File Added: bug_g119659_doc.patch

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 12:04

Message:
Logged In: YES 
user_id=6380
Originator: NO

Thanks, submitted.

(Note that I had to fix the indentation in your patch; you used four
spaces where the original code used tabs.  Please be consistent!)

Can you check if there's a need to update the docs?  If there is, send me
a doc patch and I'll apply it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 01:03:59 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 16:03:59 -0800
Subject: [Patches] [ python-Patches-1635454 ] CSV DictWriter Errors
Message-ID: <E1H6FKR-0008U5-Lo@sc8-sf-web7.sourceforge.net>

Patches item #1635454, was opened at 2007-01-14 18:03
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635454&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: CSV DictWriter Errors

Initial Comment:
In reponse to feature request 1634717.  The DictWriter, with this patch, should return a list of all offending extraneous field names, instead of simply raising a failure.

I could see a use case for this in error reporting.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635454&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 01:40:00 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 16:40:00 -0800
Subject: [Patches] [ python-Patches-1635473 ] strptime %F and %T directives
Message-ID: <E1H6FtI-0005Jp-In@sc8-sf-web4.sourceforge.net>

Patches item #1635473, was opened at 2007-01-14 18:40
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: strptime %F and %T directives

Initial Comment:
In response to bug 1633628.  %F and %T are valid directives.  These are added to Lib/_strptime.py via adding the Y-M-d H:M:S directives in sub-expressions.  Includes a test case.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 02:05:10 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 17:05:10 -0800
Subject: [Patches] [ python-Patches-1635473 ] strptime %F and %T directives
Message-ID: <E1H6GHe-0007va-Cm@sc8-sf-web4.sourceforge.net>

Patches item #1635473, was opened at 2007-01-14 18:40
Message generated for change (Comment added) made by mark-roberts
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: strptime %F and %T directives

Initial Comment:
In response to bug 1633628.  %F and %T are valid directives.  These are added to Lib/_strptime.py via adding the Y-M-d H:M:S directives in sub-expressions.  Includes a test case.

----------------------------------------------------------------------

>Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-14 19:05

Message:
Logged In: YES 
user_id=1591633
Originator: YES

I took a look on the time documentation page, and it did not detail %F and
%T, even though they were supported in strftime.  I added them to the
documentation page since strptime now supports them.
File Added: bug_1633628_strptime_doc.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 17:32:33 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 14 Jan 2007 08:32:33 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H68HZ-00024M-03@sc8-sf-web2.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 04:37
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 11:32

Message:
Logged In: YES 
user_id=6380
Originator: NO

Sorry, the test_array failure was due to not rebuilding after patching. 
Because extension modules are built using distutils, they don't get
automatically rebuilt when a relevant header has changed.

"grind to a halt": swapping, probably, due to memory filling up with
1M-character string objects, as you experienced yourself.

Your proposal takes the edge off, although I can still come up with a
worst-case scenario (just use 64K strings instead of 1M strings, and leave
the rest the same).

I am far from convinced that replacing one pathological case (O(N**2)
concatenation, which is easily explained and avoided) with another (which
is harder to explain due to the more complicated algorithms and heuristics
involved) is a good trade-off.

This is all the worse since your optimization doesn't have a clear
time/space trade-off: it mostly attempts to preserve time *and* space, but
in the worst case it can *waste* space.  (And I'm not convinced there can't
be a pathological case where it is slower, too.)  And the gains are
dependent on the ability to *avoid* ultimately rendering the string; if
every string ends up being rendered, there is no net gain in space, and
there might be no net gain in time either (at least not for slices).

I believe I would rather not pursue this patch further at this time; a far
more important programming task is the str/unicode unification (now that
the int/long unification is mostly there).

If you want to clean up the patch, I suggest that you add a large comment
section somewhere (unicode.h?) describing the algorithms in a lot of
detail, including edge cases and performance analysis, to make review of
the code possible.  But you're most welcome to withdraw it, too; it would
save me a lot of headaches.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 06:44

Message:
Logged In: YES 
user_id=364875
Originator: YES

Here's another possible fix for the worst-case scenario:

#define MAX_SLICE_DELTA (64*1024)
if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) 
    || (size_of_slice > (size_of_original / 2))  )
    use_lazy_slice();
else
    create_string_as_normal();

You'd still get the full benefit of lazy slices most of the time, but it
takes the edge off the really pathological cases.

How's that?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 05:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 18:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 19:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 12:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 01:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 23:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 23:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 22:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 21:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-11 21:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 15:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 15:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 13:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 20:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 20:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 13:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 05:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 00:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sun Jan 14 00:01:03 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 13 Jan 2007 15:01:03 -0800
Subject: [Patches] [ python-Patches-1563844 ] pybench support for IronPython
Message-ID: <E1H5rrz-0000cf-CE@sc8-sf-web8.sourceforge.net>

Patches item #1563844, was opened at 2006-09-23 04:05
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563844&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: M.-A. Lemburg (lemburg)
Summary: pybench support for IronPython

Initial Comment:
The following patch to pybench/pybench.py makes it work
on IronPython.

IronPython returns NotImplementedError for both
gc.disable() and sys.setcheckinterval() - catch that
and report it. This also requires patch #1563842, which
fixes platform.py for IronPython. 


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-14 00:01

Message:
Logged In: YES 
user_id=38388
Originator: NO

Checked in together with revision 53414.

Note that I haven't tested this on IronPython. Please reopen if the patch
doesn't work.


----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2006-09-23 04:31

Message:
Logged In: YES 
user_id=29957

Sigh. Somewhere I dropped the magic 'print' statements.
Fixed patch applied.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563844&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 12:48:53 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 15 Jan 2007 03:48:53 -0800
Subject: [Patches] [ python-Patches-1617699 ] slice-object support for
	ctypes Pointer/Array
Message-ID: <E1H6QKb-0005S5-3T@sc8-sf-web5.sourceforge.net>

Patches item #1617699, was opened at 2006-12-18 05:28
Message generated for change (Comment added) made by twouters
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Thomas Heller (theller)
Summary: slice-object support for ctypes Pointer/Array

Initial Comment:
Support for slicing ctypes' Pointer and Array types with slice objects, although only for step=1 case.
(Backported from p3yk-noslice branch.)


----------------------------------------------------------------------

>Comment By: Thomas Wouters (twouters)
Date: 2007-01-15 12:48

Message:
Logged In: YES 
user_id=34209
Originator: YES

The point is that simple slicing will go away, and extended slices (with
sliceobjects) are used in various places, and currently can't be passed on
to ctypes arrays and pointers. That is to say, a Python class defining
__getitem__ but not __getslice__ still supports slice syntax, but it can't
do 'pointer[sliceobj]' -- it would have to do
'pointer[sliceobj.start:sliceobj.end]'. Also, because simple slices will go
away, this code will have to be added to the p3yk branch in any case;
having it in the trunk just makes for easier maintenance.

Oh, and the non-support for steps other than 1 is not a fundamental issue,
I just couldn't bear to write the code for that if you didn't think it
would be useful, as I'd already written the same logic and arithmetic for
array, tupleseq, mmap and I forget what else :P You can consider this code
half-done, if you wish; I'll get to it again soon enough.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-12 21:48

Message:
Logged In: YES 
user_id=11105
Originator: NO

Thomas, a question:  Since steps != 1 are not supported, does this patch
have any value?
IIUC, array[x:y] returns exactly the same as array[x:y:1] for all x and y
values.

Formally, the patch is missing unittests and documentation ;-).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 19:45

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 16:43:10 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 15 Jan 2007 07:43:10 -0800
Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile
	patch
Message-ID: <E1H6TzK-0004yM-3D@sc8-sf-web3.sourceforge.net>

Patches item #1598415, was opened at 2006-11-17 09:44
Message generated for change (Comment added) made by cjschr
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.5
>Status: Open
Resolution: Fixed
Priority: 5
Private: No
Submitted By: chads (cjschr)
Assigned to: Vinay Sajip (vsajip)
Summary: Logging Module - followfile patch

Initial Comment:
Pertaining to the FileHandler and the file being written to:

It's possible that the file being written to will be
rolled-over by an external application such as newsyslog.
By default, FileHandler tracks the file descriptor, 
not the file.  If the original file is renamed, the file 
descriptor is still updated; however, it's probably 
desired that continued updates to the original file take 
place instead.

This patch adds an attribute to the FileHandler class
constructor (and basicConfig kw as well).  If the 
attribute evaluates to True, the filename, not the 
descriptor is tracked.  Basically, the code compares the 
file status from a previous emit call to the current call
before the base class emit is called.  If a difference in
st_ino or st_dev is found, the current stream is 
flush/closed and a new one, based on baseFilename, is
created, file status is updated, and then the base class 
emit is called.


----------------------------------------------------------------------

>Comment By: chads (cjschr)
Date: 2007-01-15 09:43

Message:
Logged In: YES 
user_id=1093928
Originator: YES

I like the implementation Vinay.  Nice work.

Thx

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-14 15:57

Message:
Logged In: YES 
user_id=308438
Originator: NO

WatchedFileHandler added to logging.handlers, checked into trunk.
Documentation updated, too.

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-11 15:50

Message:
Logged In: YES 
user_id=308438
Originator: NO

I've had a bit more of a think about this, and realised that I made a
boo-boo in one of my earlier comments. Under Windows, log files are opened
with exclusive locks, so that other processes cannot rename or move files
which are open. So I believe the approach won't work at all under Windows.
(Chad, sorry about making you redo the patch with ST_SIZE rather than
ST_DEV and ST_INO).

I also think this is a less common use case than warrants supporting it at
the basicConfig() level, which is for really very basic usage
configuration. So I would advocate adding a WatchedFileHandler (in
logging.handlers) which watches st_dev and st_ino (as per Chad's original
patch) and closes the old file descriptor and reopens the file when a
change is seen. Some recent changes checked into SVN trunk facilitate the
reopening - I've added an _open() method to FileHandler to do this.

Chad, what do you think of this approach?

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 11:06

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Uploaded the wrong diff.  This is the correct one.

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 11:02

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Updated per vsajip to work on Windoze too.  The code now
checks for a current size < previous size (based on ST_SIZE).

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2006-11-19 14:32

Message:
Logged In: YES 
user_id=308438
Originator: NO

This patch, relying as it does on Unix-specific details such as i-nodes,
does not appear as if it will work under Windows. For that reason I will
mark it as Pending and Invalid for now, if cjschr can update this tracker
item with how the patch will work on Windows, I will look at it further.
The SF system will automatically close it if no update is made to the item
in approx. 2 weeks, though it can still be reopened after that.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-11-18 13:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

Assigning to Vinay.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 17:29:21 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 15 Jan 2007 08:29:21 -0800
Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile
	patch
Message-ID: <E1H6Ui1-00010P-Eg@sc8-sf-web3.sourceforge.net>

Patches item #1598415, was opened at 2006-11-17 15:44
Message generated for change (Comment added) made by vsajip
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.5
>Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: chads (cjschr)
Assigned to: Vinay Sajip (vsajip)
Summary: Logging Module - followfile patch

Initial Comment:
Pertaining to the FileHandler and the file being written to:

It's possible that the file being written to will be
rolled-over by an external application such as newsyslog.
By default, FileHandler tracks the file descriptor, 
not the file.  If the original file is renamed, the file 
descriptor is still updated; however, it's probably 
desired that continued updates to the original file take 
place instead.

This patch adds an attribute to the FileHandler class
constructor (and basicConfig kw as well).  If the 
attribute evaluates to True, the filename, not the 
descriptor is tracked.  Basically, the code compares the 
file status from a previous emit call to the current call
before the base class emit is called.  If a difference in
st_ino or st_dev is found, the current stream is 
flush/closed and a new one, based on baseFilename, is
created, file status is updated, and then the base class 
emit is called.


----------------------------------------------------------------------

>Comment By: Vinay Sajip (vsajip)
Date: 2007-01-15 16:29

Message:
Logged In: YES 
user_id=308438
Originator: NO

You're welcome, Chad. Thanks for the idea. Closing this item.

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2007-01-15 15:43

Message:
Logged In: YES 
user_id=1093928
Originator: YES

I like the implementation Vinay.  Nice work.

Thx

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-14 21:57

Message:
Logged In: YES 
user_id=308438
Originator: NO

WatchedFileHandler added to logging.handlers, checked into trunk.
Documentation updated, too.

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-11 21:50

Message:
Logged In: YES 
user_id=308438
Originator: NO

I've had a bit more of a think about this, and realised that I made a
boo-boo in one of my earlier comments. Under Windows, log files are opened
with exclusive locks, so that other processes cannot rename or move files
which are open. So I believe the approach won't work at all under Windows.
(Chad, sorry about making you redo the patch with ST_SIZE rather than
ST_DEV and ST_INO).

I also think this is a less common use case than warrants supporting it at
the basicConfig() level, which is for really very basic usage
configuration. So I would advocate adding a WatchedFileHandler (in
logging.handlers) which watches st_dev and st_ino (as per Chad's original
patch) and closes the old file descriptor and reopens the file when a
change is seen. Some recent changes checked into SVN trunk facilitate the
reopening - I've added an _open() method to FileHandler to do this.

Chad, what do you think of this approach?

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 17:06

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Uploaded the wrong diff.  This is the correct one.

----------------------------------------------------------------------

Comment By: chads (cjschr)
Date: 2006-11-20 17:02

Message:
Logged In: YES 
user_id=1093928
Originator: YES

Updated per vsajip to work on Windoze too.  The code now
checks for a current size < previous size (based on ST_SIZE).

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2006-11-19 20:32

Message:
Logged In: YES 
user_id=308438
Originator: NO

This patch, relying as it does on Unix-specific details such as i-nodes,
does not appear as if it will work under Windows. For that reason I will
mark it as Pending and Invalid for now, if cjschr can update this tracker
item with how the patch will work on Windows, I will look at it further.
The SF system will automatically close it if no update is made to the item
in approx. 2 weeks, though it can still be reopened after that.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-11-18 19:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

Assigning to Vinay.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 19:53:24 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 15 Jan 2007 10:53:24 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H6WxQ-0001pL-0w@sc8-sf-web7.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-15 18:53

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 16:32

Message:
Logged In: YES 
user_id=6380
Originator: NO

Sorry, the test_array failure was due to not rebuilding after patching. 
Because extension modules are built using distutils, they don't get
automatically rebuilt when a relevant header has changed.

"grind to a halt": swapping, probably, due to memory filling up with
1M-character string objects, as you experienced yourself.

Your proposal takes the edge off, although I can still come up with a
worst-case scenario (just use 64K strings instead of 1M strings, and leave
the rest the same).

I am far from convinced that replacing one pathological case (O(N**2)
concatenation, which is easily explained and avoided) with another (which
is harder to explain due to the more complicated algorithms and heuristics
involved) is a good trade-off.

This is all the worse since your optimization doesn't have a clear
time/space trade-off: it mostly attempts to preserve time *and* space, but
in the worst case it can *waste* space.  (And I'm not convinced there can't
be a pathological case where it is slower, too.)  And the gains are
dependent on the ability to *avoid* ultimately rendering the string; if
every string ends up being rendered, there is no net gain in space, and
there might be no net gain in time either (at least not for slices).

I believe I would rather not pursue this patch further at this time; a far
more important programming task is the str/unicode unification (now that
the int/long unification is mostly there).

If you want to clean up the patch, I suggest that you add a large comment
section somewhere (unicode.h?) describing the algorithms in a lot of
detail, including edge cases and performance analysis, to make review of
the code possible.  But you're most welcome to withdraw it, too; it would
save me a lot of headaches.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 11:44

Message:
Logged In: YES 
user_id=364875
Originator: YES

Here's another possible fix for the worst-case scenario:

#define MAX_SLICE_DELTA (64*1024)
if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) 
    || (size_of_slice > (size_of_original / 2))  )
    use_lazy_slice();
else
    create_string_as_normal();

You'd still get the full benefit of lazy slices most of the time, but it
takes the edge off the really pathological cases.

How's that?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 10:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 23:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-13 00:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat
produces identical output before and after the patch, for both debug and
release builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have
long-lived tiny slices of gigantic strings.  If you realize you're having
that problem, simply add calls to .simplify() on the slices and the
problem should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in
out-of-memory conditions; for right now, while the patch is under early
review, I suspect that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they
hit the end.  Again, suggest that users of the Unicode API ask for the
pointer *first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files
*even fussier* to work on than they already are.  But if you don't touch
those files, you won't notice the difference*, and the patch makes some
Python string operations faster without making anything else slower.  At
the very least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the
macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE
buffer. As a result, a memory caused during the concatenation process
cannot be passed back up the call stack. The NULL return value would
result in a plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off.
The memory consumption outweighs the performance you gain by using the 'x
+= y' approach. The ''.join(list) approach also doesn't really help if
you're after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids
creating short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 19:54:10 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 15 Jan 2007 10:54:10 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H6WyA-0007Js-36@sc8-sf-monitor2.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-15 18:54

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-15 18:53

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 16:32

Message:
Logged In: YES 
user_id=6380
Originator: NO

Sorry, the test_array failure was due to not rebuilding after patching. 
Because extension modules are built using distutils, they don't get
automatically rebuilt when a relevant header has changed.

"grind to a halt": swapping, probably, due to memory filling up with
1M-character string objects, as you experienced yourself.

Your proposal takes the edge off, although I can still come up with a
worst-case scenario (just use 64K strings instead of 1M strings, and leave
the rest the same).

I am far from convinced that replacing one pathological case (O(N**2)
concatenation, which is easily explained and avoided) with another (which
is harder to explain due to the more complicated algorithms and heuristics
involved) is a good trade-off.

This is all the worse since your optimization doesn't have a clear
time/space trade-off: it mostly attempts to preserve time *and* space, but
in the worst case it can *waste* space.  (And I'm not convinced there can't
be a pathological case where it is slower, too.)  And the gains are
dependent on the ability to *avoid* ultimately rendering the string; if
every string ends up being rendered, there is no net gain in space, and
there might be no net gain in time either (at least not for slices).

I believe I would rather not pursue this patch further at this time; a far
more important programming task is the str/unicode unification (now that
the int/long unification is mostly there).

If you want to clean up the patch, I suggest that you add a large comment
section somewhere (unicode.h?) describing the algorithms in a lot of
detail, including edge cases and performance analysis, to make review of
the code possible.  But you're most welcome to withdraw it, too; it would
save me a lot of headaches.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 11:44

Message:
Logged In: YES 
user_id=364875
Originator: YES

Here's another possible fix for the worst-case scenario:

#define MAX_SLICE_DELTA (64*1024)
if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) 
    || (size_of_slice > (size_of_original / 2))  )
    use_lazy_slice();
else
    create_string_as_normal();

You'd still get the full benefit of lazy slices most of the time, but it
takes the edge off the really pathological cases.

How's that?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 10:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 23:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-13 00:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat
produces identical output before and after the patch, for both debug and
release builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have
long-lived tiny slices of gigantic strings.  If you realize you're having
that problem, simply add calls to .simplify() on the slices and the
problem should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in
out-of-memory conditions; for right now, while the patch is under early
review, I suspect that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they
hit the end.  Again, suggest that users of the Unicode API ask for the
pointer *first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files
*even fussier* to work on than they already are.  But if you don't touch
those files, you won't notice the difference*, and the patch makes some
Python string operations faster without making anything else slower.  At
the very least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the
macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE
buffer. As a result, a memory caused during the concatenation process
cannot be passed back up the call stack. The NULL return value would
result in a plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off.
The memory consumption outweighs the performance you gain by using the 'x
+= y' approach. The ''.join(list) approach also doesn't really help if
you're after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids
creating short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x +=
y in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most
such operations use short strings, you are likely creating a memory
overhead far greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Mon Jan 15 20:19:45 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 15 Jan 2007 11:19:45 -0800
Subject: [Patches] [ python-Patches-1635473 ] strptime %F and %T directives
Message-ID: <E1H6XMv-0004M4-PP@sc8-sf-web7.sourceforge.net>

Patches item #1635473, was opened at 2007-01-14 16:40
Message generated for change (Comment added) made by bcannon
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: strptime %F and %T directives

Initial Comment:
In response to bug 1633628.  %F and %T are valid directives.  These are added to Lib/_strptime.py via adding the Y-M-d H:M:S directives in sub-expressions.  Includes a test case.

----------------------------------------------------------------------

>Comment By: Brett Cannon (bcannon)
Date: 2007-01-15 11:19

Message:
Logged In: YES 
user_id=357491
Originator: NO

Thanks for the work, Mark, but I am going to have to reject this patch. 
The F and T directives are not supported necessarily on every platform (at
least to my knowledge) which is why they are not documented.  Because of
this I don't want to add support for them to strptime and have to start
maintaining directives that are not documented.

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-14 17:05

Message:
Logged In: YES 
user_id=1591633
Originator: YES

I took a look on the time documentation page, and it did not detail %F and
%T, even though they were supported in strftime.  I added them to the
documentation page since strptime now supports them.
File Added: bug_1633628_strptime_doc.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470

From noreply at sourceforge.net  Tue Jan 16 01:02:35 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 15 Jan 2007 16:02:35 -0800
Subject: [Patches] [ python-Patches-1037516 ] ftplib PASV error bug
Message-ID: <E1H6bmd-0002s0-9M@sc8-sf-web4.sourceforge.net>

Patches item #1037516, was opened at 2004-09-30 15:35
Message generated for change (Comment added) made by wayland
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1037516&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tim Nelson (wayland)
Assigned to: Nobody/Anonymous (nobody)
Summary: ftplib PASV error bug

Initial Comment:
Hi.  If ftplib gets an error while doing the PASV
section of the ntransfercmd it dies.  I've altered it
so that ntransfercmd does an autodetect, if an
autodetect hasn't been done yet.  

If there are any problems (as I'm not a python
programmer :) ), please either fix them or let me know.  

----------------------------------------------------------------------

>Comment By: Tim Nelson (wayland)
Date: 2007-01-16 11:02

Message:
Logged In: YES 
user_id=401793
Originator: YES

Oops.  I probably did, but I don't work in that job any more, so I'm
afraid I don't have access to it.  Sorry.  You should, however, be able to
correct it from the description.  

----------------------------------------------------------------------

Comment By: Andrew Bennetts (spiv)
Date: 2004-10-06 20:49

Message:
Logged In: YES 
user_id=50945

Did you mean to submit a patch with this bug report?  It
sounds like you did, but there's no files attached to this bug.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1037516&group_id=5470

From noreply at sourceforge.net  Tue Jan 16 16:33:59 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 07:33:59 -0800
Subject: [Patches] [ python-Patches-1636874 ] File Read/Write Flushing Patch
Message-ID: <E1H6qJz-0005iG-EZ@sc8-sf-web1.sourceforge.net>

Patches item #1636874, was opened at 2007-01-16 15:33
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1636874&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Windows
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: jurojin (jurojin)
Assigned to: Nobody/Anonymous (nobody)
Summary: File Read/Write Flushing Patch

Initial Comment:
The other night i was watching a google techtalk about python 3000 and
Guido mentioned some problems with the C standard io library.

In particular he highlighted an issue with switching between reading
and writing without flushing and the fact that it caused serious
errors.  Not that i dont think its a good idea to write a new io
library, but I wondered if it was the same problem ive encounted.

It only happens on windows that i know off, but the fix is simple...

Assuming you have a hanlde to the file called "Handle" and a Flush()
method, the following logic for read and write will allow you to
detect and prevent the problem.

Add this to the Read() method before reading takes place:

if ( Handle && (Handle->_flag & _IORW) && (Handle->_flag & (_IOREAD |
_IOWRT)) == _IOWRT )
{
       Flush();
       Handle->_flag |= _IOREAD;
}

Add this to the Write() method before writing takes place:

if ( Handle && (Handle->_flag & _IORW) && (Handle->_flag & (_IOREAD |
_IOWRT)) == _IOREAD )
{
       Flush();
       Handle->_flag |= _IOWRT;
}

Emerson

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1636874&group_id=5470

From noreply at sourceforge.net  Tue Jan 16 23:08:14 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 14:08:14 -0800
Subject: [Patches] [ python-Patches-1637157 ] urllib: change email.Utils ->
	email.utils
Message-ID: <E1H6wTW-0007LW-HN@sc8-sf-web2.sourceforge.net>

Patches item #1637157, was opened at 2007-01-16 14:08
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Russell Owen (reowen)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib: change email.Utils -> email.utils

Initial Comment:
urllib uses the old name email.Utils instead of the new name email.Utils. This confuses py2app and possibly other packagers.

Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470

From noreply at sourceforge.net  Tue Jan 16 23:09:27 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 14:09:27 -0800
Subject: [Patches] [ python-Patches-1637159 ] urllib2:
	email.Utils->email.utils
Message-ID: <E1H6wUh-0005es-Mz@sc8-sf-web9.sourceforge.net>

Patches item #1637159, was opened at 2007-01-16 14:09
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Russell Owen (reowen)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2: email.Utils->email.utils

Initial Comment:
urllib2 uses the old name email.Utils instead of the new name email.Utils. This may confuse py2app and/or other packagers.

Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470

From noreply at sourceforge.net  Tue Jan 16 23:11:16 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 14:11:16 -0800
Subject: [Patches] [ python-Patches-1637162 ] smtplib email renames
Message-ID: <E1H6wWP-0007xA-8T@sc8-sf-web7.sourceforge.net>

Patches item #1637162, was opened at 2007-01-16 14:11
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Russell Owen (reowen)
Assigned to: Nobody/Anonymous (nobody)
Summary: smtplib email renames

Initial Comment:
smtplib uses the old names email.Utils and email.base64MIME instead of the new email.utils and email.base64mime. This may confuse py2app and/or other packagers.

Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 07:55:45 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 22:55:45 -0800
Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing
	sys.stdout in sitecustomize
Message-ID: <E1H74i1-0007du-8s@sc8-sf-web2.sourceforge.net>

Patches item #1630975, was opened at 2007-01-08 14:55
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Thomas Wouters (twouters)
>Assigned to: Thomas Wouters (twouters)
Summary: Fix crash when replacing sys.stdout in sitecustomize

Initial Comment:
When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects.

Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes.

No idea how to test for this, from a unittest :P


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-16 22:55

Message:
Logged In: YES 
user_id=33168
Originator: NO

I can think of a nasty way to test this, but it's not really worth it. 
You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH and
spawning a python.  Ok, so it's not a real unit test, but it is a test.
:-)

This looks like it will also crash (before and after the patch) if
sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). 
sysmodule.c looks fine.

I think this is fine for 2.5.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 07:56:35 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 22:56:35 -0800
Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing
	sys.stdout in sitecustomize
Message-ID: <E1H74ip-0007rQ-6r@sc8-sf-web8.sourceforge.net>

Patches item #1630975, was opened at 2007-01-08 14:55
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Thomas Wouters (twouters)
Summary: Fix crash when replacing sys.stdout in sitecustomize

Initial Comment:
When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects.

Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes.

No idea how to test for this, from a unittest :P


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-16 22:56

Message:
Logged In: YES 
user_id=33168
Originator: NO

Forgot to mention that I agree about the warning.  If no one noticed so
far, this is such an obscure case, it's not that important to warn. 
Either way is fine with me.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-16 22:55

Message:
Logged In: YES 
user_id=33168
Originator: NO

I can think of a nasty way to test this, but it's not really worth it. 
You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH
and spawning a python.  Ok, so it's not a real unit test, but it is a
test. :-)

This looks like it will also crash (before and after the patch) if
sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). 
sysmodule.c looks fine.

I think this is fine for 2.5.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 08:09:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 23:09:11 -0800
Subject: [Patches] [ python-Patches-1610795 ] BSD version of
	ctypes.util.find_library
Message-ID: <E1H74v1-0003FE-Hq@sc8-sf-web10.sourceforge.net>

Patches item #1610795, was opened at 2006-12-07 05:29
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Martin Kammerhofer (mkam)
>Assigned to: Thomas Heller (theller)
Summary: BSD version of ctypes.util.find_library 

Initial Comment:
The ctypes.util.find_library function for Posix systems is actually
tailored for Linux systems. While the _findlib_gcc function relies
only on the GNU compiler and may therefore work on any system with the
"gcc" command in PATH, the _findLib_ld function relies on the
/sbin/ldconfig command (originating from SunOS 4.0) which is not
standardized. The version from GNU libc differs in option syntax and
output format from other ldconfig programs around.

I therefore provide a patch that enables find_library to properly
communicate with the ldconfig program on FreeBSD systems. It has been
tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD
systems too. (It works without this patch on FreeBSD, because after
getting an error from ldconfig it falls back to _findlib_gcc.)

While at it I also tidied up the Linux specific code: I'm escaping the
function argument before interpolating it into a regular expression (to
protect against nasty regexps) and removed the code for creation of a
temporary file that was not used in any way.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-16 23:09

Message:
Logged In: YES 
user_id=33168
Originator: NO

Thomas, I don't see any (public) API changes and this fixes a bug.  I
don't see a reason not to fix this in 2.5.1.  If you are comfortable with
fixing, apply the patch.  Also, please update Misc/NEWS.  Thanks!

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-12 12:21

Message:
Logged In: YES 
user_id=11105
Originator: NO

Committed into trunk as revision 53402.  Thanks for the patch and the work
on it.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-12 12:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Neal, I think this can go into the release25-maint branch since it repairs
the ctypes.util.find_library function on BSD systems.  What do you think?

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2007-01-10 03:58

Message:
Logged In: YES 
user_id=1656067
Originator: YES

The output looks good. The patch selects the numerically highest library
version.
NetBSD is not handled by the patch but works through _findLib_gcc (which
will also
be tried as a fallback strategy for Free/Open-BSD when ldconfig output
parsing fails.)

I think the patch is ready for commit.


----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-09 12:01

Message:
Logged In: YES 
user_id=11105
Originator: NO

mkam, I was eventually able to test out your patch.
I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9.
The output from "print find_library('c'), find_library('m')" on these
systems is as follows:

FreeBSD6.0:  libc.so.6, libm.so.4
NetBSD3.0: libc.so.12, libm.so.0
OpenBSD3.9: libc.so.39.0, libm.so.2.1

If you think this is what is expected, I'm happy to apply the patch.  Or
is there further work needed on it?  (Do you still need the output of
"ldconfig -r" or whatever?)

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 10:43

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-12 03:28

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine,
FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions
on how many version numbers are appended to a library name any more. Even
mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a
meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to
different.)
File Added: ctypes-util.py.patch

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-11 02:10

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Hm, I did not know that OpenBSD is still using two version numbers for
shared library.
(I conclude that from the "libc.so.39.0" in the previous followup. Btw
FreeBSD has used
a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout
executable format.)
Unfortunately my freebsd patch has the assumption of a single version
number built in;
more specifically the
  cmp(* map(lambda x: int(x.split('.')[-1]), (a, b)))
is supposed to sort based an the last dot separated field. I guess that
OpenBSD system
does not have another libc, at least none with a minor > 0. ;-)
Thomas, can you mail me the output of "ldconfig -r"? I will refine the
patch then,
doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/
fields. Said output from NetBSD welcome too. DragonflyBSD should be no
problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform
like?

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-08 12:32

Message:
Logged In: YES 
user_id=11105
Originator: NO

I have tested the patch on FreeBSD 6.0 and (after extending the check to
test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works
fine.

find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0
in OpenBSD 3.9, while it returned 'None' before on both machines.

----------------------------------------------------------------------

Comment By: David Remahl (chmod007)
Date: 2006-12-07 23:50

Message:
Logged In: YES 
user_id=2135
Originator: NO

# Does this work (without the gcc fallback) on other *BSD systems too?

I don't know, but it doesn't work on Darwin (which already has a custom
method through macholib).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-07 13:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Will do (although I would appreciate review from others too; I'm not
exactly a BSD expert).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-07 11:15

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thomas, can you take a look?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 08:42:42 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 16 Jan 2007 23:42:42 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1H75RS-0006Kv-Dj@sc8-sf-web1.sourceforge.net>

Patches item #1633807, was opened at 2007-01-11 23:13
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
>Assigned to: Guido van Rossum (gvanrossum)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-16 23:42

Message:
Logged In: YES 
user_id=33168
Originator: NO

Guido, this is the patch I was talking about wrt supporting a print
function in 2.6.  exec could get similar treatment.  You mentioned in mail
that things like except E as V: can go in without a future stmt.  I agree.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-11 23:31

Message:
Logged In: YES 
user_id=29957
Originator: YES

Updated version of patch - fixes interactive mode, adds builtins.print

File Added: print_function.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 16:24:27 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 17 Jan 2007 07:24:27 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1H7CeJ-0000Ul-FS@sc8-sf-web6.sourceforge.net>

Patches item #1633807, was opened at 2007-01-12 02:13
Message generated for change (Comment added) made by gvanrossum
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
>Assigned to: Nobody/Anonymous (nobody)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

>Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-17 10:24

Message:
Logged In: YES 
user_id=6380
Originator: NO

I don't think we need to do anything special for exec, as the exec(s,
locals, globals) syntax is already (still :-) supported in 2.x with
identical semantics as in 3.0.

except E as V *syntax* can go in without a future stmt; and (only when
that syntax is used) it should also enforce the new semantics (V must be a
simple name and is deleted at the end of the except clause).

I think Anthony's patch is a great idea, but I'll refrain from reviewing
it.  I'd say "just do it". :-)

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 02:42

Message:
Logged In: YES 
user_id=33168
Originator: NO

Guido, this is the patch I was talking about wrt supporting a print
function in 2.6.  exec could get similar treatment.  You mentioned in mail
that things like except E as V: can go in without a future stmt.  I agree.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-12 02:31

Message:
Logged In: YES 
user_id=29957
Originator: YES

Updated version of patch - fixes interactive mode, adds builtins.print

File Added: print_function.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 16:58:31 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 17 Jan 2007 07:58:31 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1H7DBH-00013X-9f@sc8-sf-monitor2.sourceforge.net>

Patches item #1633807, was opened at 2007-01-12 08:13
Message generated for change (Comment added) made by twouters
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Nobody/Anonymous (nobody)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

>Comment By: Thomas Wouters (twouters)
Date: 2007-01-17 16:58

Message:
Logged In: YES 
user_id=34209
Originator: NO

You seem to have '#if 0'ed-out some code related to the with/as-statement
warnings; I suggest just removing them. Since you're in this code now, it
might make sense to provide a commented out warning about the use of the
print statement, so we won't have to figure it out later (in Python 2.9 or
when we add -Wp3yk.)

It needs a test, and probably a doc change somewhere.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-17 16:24

Message:
Logged In: YES 
user_id=6380
Originator: NO

I don't think we need to do anything special for exec, as the exec(s,
locals, globals) syntax is already (still :-) supported in 2.x with
identical semantics as in 3.0.

except E as V *syntax* can go in without a future stmt; and (only when
that syntax is used) it should also enforce the new semantics (V must be a
simple name and is deleted at the end of the except clause).

I think Anthony's patch is a great idea, but I'll refrain from reviewing
it.  I'd say "just do it". :-)

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 08:42

Message:
Logged In: YES 
user_id=33168
Originator: NO

Guido, this is the patch I was talking about wrt supporting a print
function in 2.6.  exec could get similar treatment.  You mentioned in mail
that things like except E as V: can go in without a future stmt.  I agree.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-12 08:31

Message:
Logged In: YES 
user_id=29957
Originator: YES

Updated version of patch - fixes interactive mode, adds builtins.print

File Added: print_function.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 20:59:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 17 Jan 2007 11:59:11 -0800
Subject: [Patches] [ python-Patches-1610795 ] BSD version of
	ctypes.util.find_library
Message-ID: <E1H7GwB-0001Vb-95@sc8-sf-web1.sourceforge.net>

Patches item #1610795, was opened at 2006-12-07 14:29
Message generated for change (Comment added) made by theller
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
>Group: Python 2.5
>Status: Closed
>Resolution: Fixed
Priority: 9
Private: No
Submitted By: Martin Kammerhofer (mkam)
Assigned to: Thomas Heller (theller)
Summary: BSD version of ctypes.util.find_library 

Initial Comment:
The ctypes.util.find_library function for Posix systems is actually
tailored for Linux systems. While the _findlib_gcc function relies
only on the GNU compiler and may therefore work on any system with the
"gcc" command in PATH, the _findLib_ld function relies on the
/sbin/ldconfig command (originating from SunOS 4.0) which is not
standardized. The version from GNU libc differs in option syntax and
output format from other ldconfig programs around.

I therefore provide a patch that enables find_library to properly
communicate with the ldconfig program on FreeBSD systems. It has been
tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD
systems too. (It works without this patch on FreeBSD, because after
getting an error from ldconfig it falls back to _findlib_gcc.)

While at it I also tidied up the Linux specific code: I'm escaping the
function argument before interpolating it into a regular expression (to
protect against nasty regexps) and removed the code for creation of a
temporary file that was not used in any way.


----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2007-01-17 20:59

Message:
Logged In: YES 
user_id=11105
Originator: NO

Thanks, Neal, and Martin, again.

Committed as r53471 (and r53473 for Misc/NEWS) in the release25-maint
branch.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 08:09

Message:
Logged In: YES 
user_id=33168
Originator: NO

Thomas, I don't see any (public) API changes and this fixes a bug.  I
don't see a reason not to fix this in 2.5.1.  If you are comfortable with
fixing, apply the patch.  Also, please update Misc/NEWS.  Thanks!

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-12 21:21

Message:
Logged In: YES 
user_id=11105
Originator: NO

Committed into trunk as revision 53402.  Thanks for the patch and the work
on it.

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-12 21:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Neal, I think this can go into the release25-maint branch since it repairs
the ctypes.util.find_library function on BSD systems.  What do you think?

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2007-01-10 12:58

Message:
Logged In: YES 
user_id=1656067
Originator: YES

The output looks good. The patch selects the numerically highest library
version.
NetBSD is not handled by the patch but works through _findLib_gcc (which
will also
be tried as a fallback strategy for Free/Open-BSD when ldconfig output
parsing fails.)

I think the patch is ready for commit.


----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2007-01-09 21:01

Message:
Logged In: YES 
user_id=11105
Originator: NO

mkam, I was eventually able to test out your patch.
I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9.
The output from "print find_library('c'), find_library('m')" on these
systems is as follows:

FreeBSD6.0:  libc.so.6, libm.so.4
NetBSD3.0: libc.so.12, libm.so.0
OpenBSD3.9: libc.so.39.0, libm.so.2.1

If you think this is what is expected, I'm happy to apply the patch.  Or
is there further work needed on it?  (Do you still need the output of
"ldconfig -r" or whatever?)

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-20 19:43

Message:
Logged In: YES 
user_id=11105
Originator: NO

Unfortunately I'm unable to review or work on this patch *this year*.  I
will definitely take a look in January.  Sorry.

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-12 12:28

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine,
FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions
on how many version numbers are appended to a library name any more. Even
mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a
meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to
different.)
File Added: ctypes-util.py.patch

----------------------------------------------------------------------

Comment By: Martin Kammerhofer (mkam)
Date: 2006-12-11 11:10

Message:
Logged In: YES 
user_id=1656067
Originator: YES

Hm, I did not know that OpenBSD is still using two version numbers for
shared library.
(I conclude that from the "libc.so.39.0" in the previous followup. Btw
FreeBSD has used
a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout
executable format.)
Unfortunately my freebsd patch has the assumption of a single version
number built in;
more specifically the
  cmp(* map(lambda x: int(x.split('.')[-1]), (a, b)))
is supposed to sort based an the last dot separated field. I guess that
OpenBSD system
does not have another libc, at least none with a minor > 0. ;-)
Thomas, can you mail me the output of "ldconfig -r"? I will refine the
patch then,
doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/
fields. Said output from NetBSD welcome too. DragonflyBSD should be no
problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform
like?

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-08 21:32

Message:
Logged In: YES 
user_id=11105
Originator: NO

I have tested the patch on FreeBSD 6.0 and (after extending the check to
test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works
fine.

find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0
in OpenBSD 3.9, while it returned 'None' before on both machines.

----------------------------------------------------------------------

Comment By: David Remahl (chmod007)
Date: 2006-12-08 08:50

Message:
Logged In: YES 
user_id=2135
Originator: NO

# Does this work (without the gcc fallback) on other *BSD systems too?

I don't know, but it doesn't work on Darwin (which already has a custom
method through macholib).

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-07 22:11

Message:
Logged In: YES 
user_id=11105
Originator: NO

Will do (although I would appreciate review from others too; I'm not
exactly a BSD expert).

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-07 20:15

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thomas, can you take a look?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470

From noreply at sourceforge.net  Wed Jan 17 21:07:38 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 17 Jan 2007 12:07:38 -0800
Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module
Message-ID: <E1H7H4M-00023c-2z@sc8-sf-web10.sourceforge.net>

Patches item #1638033, was opened at 2007-01-17 21:07
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Arvin Schnell (arvins)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add httponly to Cookie module

Initial Comment:
Add the Microsoft extension httponly to the
Cookie module.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

From noreply at sourceforge.net  Thu Jan 18 04:52:36 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 17 Jan 2007 19:52:36 -0800
Subject: [Patches] [ python-Patches-1638243 ] compiler.pycodegen causes
	crashes when compiling 'with'
Message-ID: <E1H7OKK-0007i1-9p@sc8-sf-web10.sourceforge.net>

Patches item #1638243, was opened at 2007-01-17 22:52
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Parser/Compiler
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: kirat (kirat)
Assigned to: Nobody/Anonymous (nobody)
Summary: compiler.pycodegen causes crashes when compiling 'with'

Initial Comment:
The compiler package in the python library is missing a LOAD/DELETE just before the WITH_CLEANUP instruction.

Also transformer isn't creating the with_var as an assignment.

So the following little code snippet will crash if you compile and run it with compiler.compile()

class TrivialContext:
    def __enter__(self): return self
    def __exit__(self,*exc_info): pass

def f():
    with TrivialContext() as tc:
        return 1
f()

The fix is just a few lines. I'm enclosing a patch against the python 2.5 source.

I've also added the above as a test case to the test_compiler.py file.

regards,
-Kirat


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470

From noreply at sourceforge.net  Thu Jan 18 20:03:34 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 18 Jan 2007 11:03:34 -0800
Subject: [Patches] [ python-Patches-1638879 ] Fix to the long("123\0",
	10) problem
Message-ID: <E1H7cXu-0005wr-A7@sc8-sf-web9.sourceforge.net>

Patches item #1638879, was opened at 2007-01-18 14:03
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Calvin Spealman (ironfroggy)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix to the long("123\0", 10) problem

Initial Comment:
This is a simple patch adapted from the int_new function to the long_new function.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470

From noreply at sourceforge.net  Fri Jan 19 16:06:27 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 19 Jan 2007 07:06:27 -0800
Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module
Message-ID: <E1H7vJy-0003iK-Vv@sc8-sf-web4.sourceforge.net>

Patches item #1638033, was opened at 2007-01-17 15:07
Message generated for change (Comment added) made by jimjjewett
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Arvin Schnell (arvins)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add httponly to Cookie module

Initial Comment:
Add the Microsoft extension httponly to the
Cookie module.


----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-19 10:06

Message:
Logged In: YES 
user_id=764593
Originator: NO

The documentation change should say what the attribute does.  (It requests
the the cookie be hidden from javascript, and available only to http
requests.)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

From noreply at sourceforge.net  Fri Jan 19 18:01:21 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 19 Jan 2007 09:01:21 -0800
Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module
Message-ID: <E1H7x7B-0006iv-KO@sc8-sf-web4.sourceforge.net>

Patches item #1638033, was opened at 2007-01-17 21:07
Message generated for change (Comment added) made by arvins
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Arvin Schnell (arvins)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add httponly to Cookie module

Initial Comment:
Add the Microsoft extension httponly to the
Cookie module.


----------------------------------------------------------------------

>Comment By: Arvin Schnell (arvins)
Date: 2007-01-19 18:01

Message:
Logged In: YES 
user_id=698939
Originator: YES

Sure, I have added some documentation to the patch.

File Added: python.diff

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-19 16:06

Message:
Logged In: YES 
user_id=764593
Originator: NO

The documentation change should say what the attribute does.  (It requests
the the cookie be hidden from javascript, and available only to http
requests.)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

From noreply at sourceforge.net  Sat Jan 20 02:15:21 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 19 Jan 2007 17:15:21 -0800
Subject: [Patches] [ python-Patches-1639973 ] email.utils.parsedate
	documentation
Message-ID: <E1H84pF-0000yR-8d@sc8-sf-web3.sourceforge.net>

Patches item #1639973, was opened at 2007-01-19 19:15
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1639973&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: email.utils.parsedate documentation

Initial Comment:
See bug 1629566 (python.org/sf/1629566) for discussion.  This patch eliminates any ambiguity in the documentation regarding which fields of the time tuple it refers to.

This patch specifies the documentation in both librfc822.tex and emailutil.tex

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1639973&group_id=5470

From noreply at sourceforge.net  Sat Jan 20 03:44:33 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 19 Jan 2007 18:44:33 -0800
Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does
	not close sockets properly)
Message-ID: <E1H86DZ-00022e-BX@sc8-sf-web10.sourceforge.net>

Patches item #1627441, was opened at 2007-01-03 17:46
Message generated for change (Comment added) made by mark-roberts
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for #1601399 (urllib2 does not close sockets properly)

Initial Comment:
Fix for #1601399

Definitely a backport candidate.


----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-19 20:44

Message:
Logged In: YES 
user_id=1591633
Originator: NO

Patch looks good to me, and the tests still pass.  If it matters, I would
like to see a test case presented in the patch as well.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

From noreply at sourceforge.net  Sat Jan 20 10:16:44 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 20 Jan 2007 01:16:44 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H8CL6-0002Yf-7X@sc8-sf-web6.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-20 09:16

Message:
Logged In: YES 
user_id=364875
Originator: YES

Whoops, sorry.  I refreshed a summary page I had lying around, which I
guess re-posted the comment!  Didn't mean to spam you with extra updates.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-20 09:14

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-15 18:54

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-15 18:53

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 16:32

Message:
Logged In: YES 
user_id=6380
Originator: NO

Sorry, the test_array failure was due to not rebuilding after patching. 
Because extension modules are built using distutils, they don't get
automatically rebuilt when a relevant header has changed.

"grind to a halt": swapping, probably, due to memory filling up with
1M-character string objects, as you experienced yourself.

Your proposal takes the edge off, although I can still come up with a
worst-case scenario (just use 64K strings instead of 1M strings, and leave
the rest the same).

I am far from convinced that replacing one pathological case (O(N**2)
concatenation, which is easily explained and avoided) with another (which
is harder to explain due to the more complicated algorithms and heuristics
involved) is a good trade-off.

This is all the worse since your optimization doesn't have a clear
time/space trade-off: it mostly attempts to preserve time *and* space, but
in the worst case it can *waste* space.  (And I'm not convinced there can't
be a pathological case where it is slower, too.)  And the gains are
dependent on the ability to *avoid* ultimately rendering the string; if
every string ends up being rendered, there is no net gain in space, and
there might be no net gain in time either (at least not for slices).

I believe I would rather not pursue this patch further at this time; a far
more important programming task is the str/unicode unification (now that
the int/long unification is mostly there).

If you want to clean up the patch, I suggest that you add a large comment
section somewhere (unicode.h?) describing the algorithms in a lot of
detail, including edge cases and performance analysis, to make review of
the code possible.  But you're most welcome to withdraw it, too; it would
save me a lot of headaches.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 11:44

Message:
Logged In: YES 
user_id=364875
Originator: YES

Here's another possible fix for the worst-case scenario:

#define MAX_SLICE_DELTA (64*1024)
if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) 
    || (size_of_slice > (size_of_original / 2))  )
    use_lazy_slice();
else
    create_string_as_normal();

You'd still get the full benefit of lazy slices most of the time, but it
takes the edge off the really pathological cases.

How's that?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 10:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 23:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-13 00:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sat Jan 20 10:14:41 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 20 Jan 2007 01:14:41 -0800
Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings"
	patches
Message-ID: <E1H8CJ7-0002MY-9c@sc8-sf-web6.sourceforge.net>

Patches item #1629305, was opened at 2007-01-06 09:37
Message generated for change (Comment added) made by lhastings
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 3000
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Larry Hastings (lhastings)
Assigned to: Nobody/Anonymous (nobody)
Summary: The Unicode "lazy strings" patches

Initial Comment:
These are patches to add lazy processing to Unicode strings for Python 3000.  I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted.

There is a long discussion about "lazy concatenation" here:
http://mail.python.org/pipermail/python-dev/2006-October/069224.html
And another long discussion about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch.  Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer.   This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too.  Since Unicode objects already look like that, the Unicode lazy patches should be independent.

----------------------------------------------------------------------

>Comment By: Larry Hastings (lhastings)
Date: 2007-01-20 09:14

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-15 18:54

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-15 18:53

Message:
Logged In: YES 
user_id=364875
Originator: YES

As discussed (briefly) over email, I'm moving this discussion back to the
Python-3000 mailing list.  But before I do I wanted to clear up something
from your reply.

"lazy concatenation" and "lazy slices" are really two patches, filed under
the "lazy slices" penumbra.  They are different optimizations, with
different implementations and different behaviors.  I implemented them
cumulatively to save work because they intertwine when merged, but I had
hoped they would be considered independently.  I apologize if this point
was unclear (and moreso if it was a bad idea).  My reason for doing so: I
suspected "lazy slices" were doomed from the start; doing the patch this
way meant wasting less work.

One downside of "lazy slices" is their ability to waste loads of memory in
the worst-case.  Now, "lazy concatenation" simply doesn't have that
problem.  Yet the fourth and fifth paragraphs of your most recent reply
imply you think it can.

A quick recap of lazy concatenation:
  a = u"a"
  b = u"b"
  concat = a + b
"concat" is a PyUnicodeConcatenationObject holding references to a and b
(or rather their values).  Its "value" is NULL, indicating that it is
unrendered.  The moment someone asks for the value of "concat", the object
allocates space for its value, constructs the value by walking its tree of
children, and frees its children.  The implementation is heavily optimized
for the general case (concatenation) and avoids recursion where possible.

The worst-case memory consumption behavior of lazy concatenation is adding
lots and lots of tiny strings and never rendering; that will allocate lots
of PyUnicodeConcatenationObjects.  But it's nowhere near as bad as a short
lazy slice of a long string.

Does that make "lazy concatenation" more palatable?

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-14 16:32

Message:
Logged In: YES 
user_id=6380
Originator: NO

Sorry, the test_array failure was due to not rebuilding after patching. 
Because extension modules are built using distutils, they don't get
automatically rebuilt when a relevant header has changed.

"grind to a halt": swapping, probably, due to memory filling up with
1M-character string objects, as you experienced yourself.

Your proposal takes the edge off, although I can still come up with a
worst-case scenario (just use 64K strings instead of 1M strings, and leave
the rest the same).

I am far from convinced that replacing one pathological case (O(N**2)
concatenation, which is easily explained and avoided) with another (which
is harder to explain due to the more complicated algorithms and heuristics
involved) is a good trade-off.

This is all the worse since your optimization doesn't have a clear
time/space trade-off: it mostly attempts to preserve time *and* space, but
in the worst case it can *waste* space.  (And I'm not convinced there can't
be a pathological case where it is slower, too.)  And the gains are
dependent on the ability to *avoid* ultimately rendering the string; if
every string ends up being rendered, there is no net gain in space, and
there might be no net gain in time either (at least not for slices).

I believe I would rather not pursue this patch further at this time; a far
more important programming task is the str/unicode unification (now that
the int/long unification is mostly there).

If you want to clean up the patch, I suggest that you add a large comment
section somewhere (unicode.h?) describing the algorithms in a lot of
detail, including edge cases and performance analysis, to make review of
the code possible.  But you're most welcome to withdraw it, too; it would
save me a lot of headaches.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 11:44

Message:
Logged In: YES 
user_id=364875
Originator: YES

Here's another possible fix for the worst-case scenario:

#define MAX_SLICE_DELTA (64*1024)
if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) 
    || (size_of_slice > (size_of_original / 2))  )
    use_lazy_slice();
else
    create_string_as_normal();

You'd still get the full benefit of lazy slices most of the time, but it
takes the edge off the really pathological cases.

How's that?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-14 10:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

Thanks for taking the time!

> - Style: you set your tab stops to 4 spaces.  That is an absolute
> no-no!

Sorry about that; I'll fix it if I resubmit.


> - Segfault in test_array. It seems that it's receiving a unicode
> slice object and treating it like a "classic" unicode object.

I tested on Windows and Linux, and I haven't seen that behavior.

Which test_array, by the way?  In Lib/test, or Lib/ctypes/test?
I'm having trouble with most of the DLL extensions on Windows;
they complain that the module uses the incompatible python26.dll
or python26_d.dll.  So I haven't tested ctypes/test_array.py
on Windows, but I have tested the other three permutations of
Linux vs Windows and Lib/test/test_array vs
Lib/ctypes/test/test_array.

Can you give me a stack trace to the segfault?  With that I bet I
can fix it even without a reproducible test case.


> - I got it to come to a grinding halt with the following worst-case
> scenario:
> 
>   a = []
>   while True:
>       x = u"x"*1000000
>       x = x[30:60]  # Short slice of long string
>       a.append(x)
> 
> If you can't do better than that, I'll have to reject it.
> 
> PS I used your combined patch, if it matters.

It matters.  The combined patch has "lazy slices", the other
patch does not.


When you say "grind to a halt" I'm not sure what you mean.
Was it thrashing?  How much CPU was it using?

When I ran that test, my Windows computer got to 1035 iterations
then threw a MemoryError.  My Linux box behaved the same, except
it got to 1605 iterations.


Adding a call to .simplify() on the slice defeats this worst-case
scenario:

a = []
while True:
    x = u"x"*1000000
    x = x[30:60].simplify()  # Short slice of long string
    a.append(x)

.simplify() forces lazy strings to render themselves.  With that
change, this test will run until the cows come home.  Is that
acceptable?


Failing that, is there any sort of last-ditch garbage collection
pass that gets called when a memory allocation fails but before
it returns NULL?  If so, I could hook in to that and try to render
some slices.  (I don't see such a pass, but maybe I missed it.)

Failing that, I could add garbage-collect-and-retry-once logic to
memory allocation myself, either just for unicodeobject.c or as a
global change.  But I'd be shocked if you were interested in that
approach; if Python doesn't have such a thing by now, you probably
don't want it.

And failing that, "lazy slices" are probably toast.  It always was
a tradeoff of speed for worst-case memory use, and I always knew
it might not fly.  If that's the case, please take a look at the
other patch, and in the meantime I'll see if anyone can come up with
other ways to mitigate the worst-case scenario.

----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-13 23:59

Message:
Logged In: YES 
user_id=6380
Originator: NO

Problems so far:

- Style: you set your tab stops to 4 spaces.  That is an absolute
no-no!  You can indent using 4 spaces, but you should NEVER assume
that a TAB character is anything except 8 spaces.

- Segfault in test_array. It seems that it's receiving a unicode slice
object and treating it like a "classic" unicode object.

- I got it to come to a grinding halt with the following worst-case
scenario:

  a = []
  while True:
      x = u"x"*1000000
      x = x[30:60]  # Short slice of long string
      a.append(x)

If you can't do better than that, I'll have to reject it.

PS I used your combined patch, if it matters.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-13 00:03

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: pybench.first.results.zip

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 17:57

Message:
Logged In: YES 
user_id=364875
Originator: YES

josiahcarlson:

I think you misunderstood options 2 and 3.  The empty string (option 2)
or
nonempty but fixed size string (option 3) would *only* be returned in the
event of an allocation failure, aka "the process is out of memory". 
Since
it's out of memory yet trying to allocate more, it has *already* failed.
My goal in proposing options 2 and 3 was that, when this happens (and it
eventually will), Python would fail *gracefully* with an exception,
rather
than *miserably* with a bus error.

As for writing a wrapper, I'm just not interested.  I'm a strong believer
in "There should be one--and preferably only one--obvious way to do it",
and I feel a special-purpose wrapper class for good string performance
adds mental clutter.  The obvious way to do string concatenation is with
"+"; the obvious way to to string slices is with "[:]".  My goal is to
make those fast so that you can use them *everywhere*--even in
performance-critical code.  I don't want a wrapper class, and have no
interest in contributing to one.

For what it's worth, I came up with a fifth approach this morning while
posting to the Python-3000 mailing list: pre-allocate the str buffer,
updating it to the correct size whenever the lazy object changes size.
That would certainly fix the problem; the error would occur in a much
more reportable place.  But it would also slow down the code quite a lot,
negating many of the speed gains of this approach.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-12 06:55

Message:
Logged In: YES 
user_id=341410
Originator: NO

I don't think that changing the possible return of PyUnicode_AS_UNICODE is
reasonable. (option 1)

Option 2 breaks the buffer interface.

Option 3 severely limits the size of potential unicode strings.  If you
are only manipulating tiny unicode strings (8k?), then the effect of fast
concatenation, slicing, etc., isn't terribly significant.

Option 4 is possible, but I know I would feel bad if all of this work went
to waste.


Note what M. A. Lemburg mentioned.  The functionality is useful, it's the
polymorphic representation that is the issue.  Rather than attempting to
change the unicode representation, what about a wrapper type?  Keep the
base unicode representation simple (both Guido and M. A. have talked about
this).  Guido has also stated that he wouldn't be against views (slicing
and/or concatenation) if they could be shown to have real use-cases.  The
use-cases you have offered here are still applicable, and because it
wouldn't necessitate a (not insignificant) change in semantics and 3rd
party code, would make it acceptable.

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:32

Message:
Logged In: YES 
user_id=364875
Originator: YES

Just fixed the build under Linux--sorry, should have done that before
posting the original patch.  Patches now built and tested under Win32 and
Linux, and produce the same output as an unpatched py3k trunk.

lemburg: A minor correction: the full "lazy strings" patch (with "lazy
slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and
"Objects/stringobject.c", in addition to the two unicodeobject.* files. 
The changes to these three files are minuscule, and don't affect their
maintainability, so the gist of my statements still hold.  (Besides, all
three of those files will probably go away before Py3k ships.)
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 04:25

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 03:12

Message:
Logged In: YES 
user_id=364875
Originator: YES

Attached below you will find the full "lazy strings" patch, which has both
"lazy concatenation" and "lazy slices".  The diff is against the current
revision of the Py3k branch, #53392.  On my machine (Win32) rt.bat produces
identical output before and after the patch, for both debug and release
builds.

As I mentioned in a previous comment, you can read the description (and
ensuing conversation) about "lazy slices" here:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html

One new feature of this version: I added a method on a Unicode string,
s.simplify(), which forces the string to "render" if it's one of my exotic
string subtypes (a lazy concatenation or lazy slice).  My goal is to
assuage fears about pathological memory-use cases where you have long-lived
tiny slices of gigantic strings.  If you realize you're having that
problem, simply add calls to .simplify() on the slices and the problem
should go away.

As for the semantics of .simplify(), it returns a reference to the string
s.  Honestly I wasn't sure whether it should return a new string or just
monkey with the existing string.  Really, rendering doesn't change the
string; it's the same string, with the exact same external behavior, just
with different bits floating around underneath.  For now it monkeys with
the existing string, as that seemed best.  (But I'd be happy to switch it
to returning a new string if it'd help.)

I had planned to make the "lazy slices" patch independent of the "lazy
concatenation" patch.  However, it wound up being a bigger pain that I
thought, and anyway I figure the likelyhood that "lazy slices" would be
accepted and "lazy concatenation" would not is effectively zero.  So I
didn't bother.  If there's genuine interest in "lazy slices" without "lazy
concatenation", I can produce such a thing.
File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-12 02:42

Message:
Logged In: YES 
user_id=364875
Originator: YES

lemburg:

You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is
new behavior, and this could conceivably result in crashes.  To be clear:
NULL return values will only happen when allocation of the final "str"
buffer fails during lazy rendering.  This will only happen in out-of-memory
conditions; for right now, while the patch is under early review, I suspect
that's okay.

So far I've come up with four possible ways to resolve this problem, which
I will list here from least-likely to most-likely:

1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return
NULL, and fix every place in the Python source tree that calls it to check
for a NULL return.  Document this with strong language for external C
module authors.
2. Change the length to 0 and return a constant empty string.  Suggest
that users of the Unicode API ask for the pointer *first* and the length
*second*.
3. Change the length to 0 and return a previously-allocated buffer of some
hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the
caller iterates over the buffer, odds are good they'll stop before they hit
the end.  Again, suggest that users of the Unicode API ask for the pointer
*first* and the length *second*.
4. The patch is not accepted.

Of course, I'm open to suggestions of other approaches.  (Not to mention
patches!)


Regarding your memory usage and "slice integers" comments, perhaps you'll
be interested in the full lazy patch, which I hope to post later today. 
"Lazy concatenation" is only one of the features of the full patch; the
other is "lazy slices".  For a full description of my "lazy slices"
implementation, see this posting (and the subsequent conversation) to
Python-Dev:
http://mail.python.org/pipermail/python-dev/2006-October/069506.html
And yes, lazy slices suffer from the same
possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy
concatenation does.


As for your final statement, I never claimed that this was a particularly
clean design. I merely claim it makes things faster and is (so far)
self-contained.  For the Unicode versions of my lazy strings patches, the
only files I touched were "Include/unicodeobject.h" and
"Objects/unicodeobject.c".  I freely admit my patch makes those files *even
fussier* to work on than they already are.  But if you don't touch those
files, you won't notice the difference*, and the patch makes some Python
string operations faster without making anything else slower.  At the very
least I suggest the patches are worthy of examination.

* Barring API changes to rectify the possible NULL return from
PyUnicode_AS_UNICODE() problem, that is.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-10 20:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

Larry, I probably wasn't clear enough:

PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE
buffer. No API using this macro checks for a NULL return value of the macro
since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer.
As a result, a memory caused during the concatenation process cannot be
passed back up the call stack. The NULL return value would result in a
plain segfault in the calling API.

Regarding the tradeoff and trying such an approach: I've done such tests
myself (not with Unicode but with 8-bit strings) and it didn't pay off. The
memory consumption outweighs the performance you gain by using the 'x += y'
approach. The ''.join(list) approach also doesn't really help if you're
after performance (for much the same reasons). 

In mxTextTools I used slice integers pointing into the original parsed
string to work around these problems, which works great and avoids creating
short strings altogether (so you gain speed and memory).

A patch I would find a lot more useful is one to create a Unicode
alternative to cStringIO - for strings, this is by far the most performant
way of creating a larger string from lots of small pieces. To complement
this, a smart slice type might also be an attractive target; one that
breaks up a larger string into slices and provides operations on these,
including joining them to form a new string.

I'm not convinced that murking with the underlying object type and doing
"subtyping" on-the-fly is a clean design.


----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-10 20:30

Message:
Logged In: YES 
user_id=364875
Originator: YES

Much of what I do in Python is text processing.  My largest Python project
to date was an IDL which spewed out loads of text; I've also written an
HTML formatter or two.  I seem to do an awful lot of string concatenation
in Python, and I'd like it to be fast.  I'm not alone in this, as there
have been several patches to Python in recent years to speed up string
concatenation.

Perhaps you aren't familiar with my original justification for the patch. 
I've always hated the "".join() idiom for string concatenation, as it
violates the "There should be one--and preferably only one--obvious way to
do it" principle (and arguably others).  With lazy concatenation, the
obvious way (using +) becomes competitive with "".join(), thus dispensing
with the need for this inobvious and distracting idiom.

For a more thorough dissection of the (original) patch, including its
implementation and lots of discussion from other people, please see the
original thread on c.l.p:
http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf
Please ignore the benchmarks there, as they were quite flawed.

And, no, I haven't seen a lot of code manipulating Unicode strings yet,
but then I'm not a Python shaker-and-mover.  Obviously I expect to see a
whole lot more when Py3k is adopted.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-10 18:24

Message:
Logged In: YES 
user_id=341410
Originator: NO

>From what I understand, the point of the lazy strings patch is to make
certain operations faster.  What operations?  Generally speaking, looped
concatenation (x += y), and other looping operations that have
traditionally been slow; O(n^2).

While this error is still common among new users of Python, generally
users only get bit once.  They ask about it on python-list and are told: z
= []; z.append(y); x = ''.join(z) .

Then again, the only place where I've seen the iterative building up of
*text* is really in document reformatting (like textwrap).  Basically all
other use-cases (that I have seen) generally involve the manipulation of
binary data.  Larry, out of curiosity, have you found code out there that
currently loops and concatenates unicode?

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:26

Message:
Logged In: YES 
user_id=364875
Originator: YES

Continuing the comedy of errors, concat patch #2 was actually the same as
#1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). 
Fixed in concat patch #3.  (Deleting concat patch #2.)
File Added: lch.py3k.unicode.lazy.concat.patch.3.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-09 01:10

Message:
Logged In: YES 
user_id=364875
Originator: YES

Revised the lazy concatenation patch to add (doh!) a check for when
PyMem_NEW() fails in PyUnicode_AsUnicode().
File Added: lch.py3k.unicode.lazy.concat.patch.2.txt

----------------------------------------------------------------------

Comment By: Larry Hastings (lhastings)
Date: 2007-01-08 18:50

Message:
Logged In: YES 
user_id=364875
Originator: YES

jcarlson:
The first time someone calls PyUnicode_AsUnicode() on a concatenation
object, it renders the string, and that's an O(something) operation.  In
general this rendering is O(i), aka linear time, though linear related to
*what* depends.  (It iterates over the m concatenated strings, and each of
the n characters in those strings, and whether n or m is more important
depends on their values.)  After rendering, the object behaves like any
other Unicode string, including O(1) for array element lookup.

If you're referring to GvR's statement "I mention performance because s[i]
should remain an O(1) operation.", here:
http://mail.python.org/pipermail/python-3000/2006-December/005281.html
I suspect this refers to the UCS-2 vs. UTF-16 debate.

lemberg:
Your criticisms are fair; lazy evaluation is a tradeoff.  In general my
response to theories about how it will affect performance is "I invite you
to try it and see".

As for causing memory errors, the only problem I see is not checking for a
NULL return from PyMem_NEW() in PyUnicode_AsUnicode().  But that's a bug,
not a flaw in my approach, and I'll fix that bug today.  I don't see how
"[my] approach can cause memory errors" in any sort of larger sense.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2007-01-08 10:59

Message:
Logged In: YES 
user_id=38388
Originator: NO

While I don't think the added complexity in the implementation is worth
it, given that there are other ways of achieving the same kind of
performance (e.g. list of Unicode strings), some comments:

 * you add a long field to every Unicode object - so every single object
in the system pays 4-8 bytes for the small performance advantage

 * Unicode objects are often references using PyUnicode_AS_UNICODE(); this
operation doesn't allow passing back errors, yet your lazy evaluation
approach can cause memory errors - how are you going to deal with them ? 
(currently you don't even test for them)

 * the lazy approach keeps all partial Unicode objects alive until they
finally get concatenated; if you have lots of those (e.g. if you use x += y
in a loop), then you pay the complete Python object overhead for every
single partial Unicode object in the list of strings - given that most such
operations use short strings, you are likely creating a memory overhead far
greater than the the total length of all the strings


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-07 05:08

Message:
Logged In: YES 
user_id=341410
Originator: NO

What are the performance characteristics of each operation?  I presume
that a + b for unicode strings a and b is O(1) time (if I understand your
implementation correctly).  But according to my reading, (a + b + c +
...)[i] is O(number of concatenations performed).  Is this correct?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470

From noreply at sourceforge.net  Sun Jan 21 01:08:08 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 20 Jan 2007 16:08:08 -0800
Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does
	not close sockets properly)
Message-ID: <E1H8QFk-0007mI-GO@sc8-sf-web7.sourceforge.net>

Patches item #1627441, was opened at 2007-01-03 23:46
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for #1601399 (urllib2 does not close sockets properly)

Initial Comment:
Fix for #1601399

Definitely a backport candidate.


----------------------------------------------------------------------

>Comment By: John J Lee (jjlee)
Date: 2007-01-21 00:08

Message:
Logged In: YES 
user_id=261020
Originator: YES

Added tests.
File Added: urllib2_close_socket_v2.patch

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-20 02:44

Message:
Logged In: YES 
user_id=1591633
Originator: NO

Patch looks good to me, and the tests still pass.  If it matters, I would
like to see a test case presented in the patch as well.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

From noreply at sourceforge.net  Sun Jan 21 06:26:25 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 20 Jan 2007 21:26:25 -0800
Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does
	not close sockets properly)
Message-ID: <E1H8VDl-0004fx-HG@sc8-sf-web8.sourceforge.net>

Patches item #1627441, was opened at 2007-01-03 17:46
Message generated for change (Comment added) made by mark-roberts
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for #1601399 (urllib2 does not close sockets properly)

Initial Comment:
Fix for #1601399

Definitely a backport candidate.


----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-20 23:26

Message:
Logged In: YES 
user_id=1591633
Originator: NO

I'd say it looks good.  Now lets see if we can get someone to apply it for
us.  Thanks for the adding the tests!

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-20 18:08

Message:
Logged In: YES 
user_id=261020
Originator: YES

Added tests.
File Added: urllib2_close_socket_v2.patch

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-19 20:44

Message:
Logged In: YES 
user_id=1591633
Originator: NO

Patch looks good to me, and the tests still pass.  If it matters, I would
like to see a test case presented in the patch as well.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

From noreply at sourceforge.net  Sun Jan 21 10:33:31 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 21 Jan 2007 01:33:31 -0800
Subject: [Patches] [ python-Patches-1610575 ] C99 _Bool support for struct
Message-ID: <E1H8Z4t-0003EZ-0Z@sc8-sf-web7.sourceforge.net>

Patches item #1610575, was opened at 2006-12-07 06:37
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610575&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.6
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: David Remahl (chmod007)
Assigned to: Nobody/Anonymous (nobody)
Summary: C99 _Bool support for struct

Initial Comment:
C99 adds the fundamental _Bool integer type (fundamental in the sense that it is not equivalent to or a composite of any other C type). Its size can vary from platform to platform; the only restriction imposed by the C standard is that it must be able to hold the values 0 or 1. Typically, sizeof _Bool is 1 or 4.

A struct module user trying to parse a native C structure that contains a _Bool member faces a problem: struct does not have a format character for _Bool. One is forced to hardcode a size for bool (use a char or an int instead).

This patch adds support for a new format character, 't', representing the fundamental type _Bool. It is handled sementically as representing pure booleans -- when packing a structure the truth value of the argument to be packed is used and when unpacking either True or False is always returned.

For platforms that don't support _Bool, as well as in non-native mode, 't' packs as a single byte.

Test cases are included, as well as a small change to the struct documentation. The patch modifies configure.in to check for _Bool support, and the patch includes the autogenerated configure and pyconfig.h.in files as well.

I have tested the module on Mac OS X x86 (uses 1 byte for _Bool) and Mac OS X ppc (uses 4 bytes for _Bool). Ran regression suite.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-21 10:33

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thanks for the patch. Committed as r53508

----------------------------------------------------------------------

Comment By: David Remahl (chmod007)
Date: 2006-12-08 08:13

Message:
Logged In: YES 
user_id=2135
Originator: YES

Oops!

I didn't intend for there to be any ctypes content in this patch (as
indicated by the subject), but apparently I forgot to remove part of the
ctypes section. I have uploaded a new patch without that part.

Once this has been integrated, I'll upload a complete ctypes patch for
consideration.
File Added: bool struct patch-2.diff

----------------------------------------------------------------------

Comment By: Thomas Heller (theller)
Date: 2006-12-07 21:09

Message:
Logged In: YES 
user_id=11105
Originator: NO

The patch is not complete or not correct.

Either:

- the part of that patch that changes Modules/_ctypes/_ctypes.c should be
omitted because
  it does not contain a ctypes _Bool type

- or complete support for a ctypes _Bool type (how would that be called?
ctypes.c99_bool?)
  should be added, together with tests in Lib/ctypes/test

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610575&group_id=5470

From noreply at sourceforge.net  Sun Jan 21 11:35:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 21 Jan 2007 02:35:30 -0800
Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does
	not close sockets properly)
Message-ID: <E1H8a2s-0005N6-J5@sc8-sf-web6.sourceforge.net>

Patches item #1627441, was opened at 2007-01-03 23:46
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: John J Lee (jjlee)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix for #1601399 (urllib2 does not close sockets properly)

Initial Comment:
Fix for #1601399

Definitely a backport candidate.


----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-21 10:35

Message:
Logged In: YES 
user_id=849994
Originator: NO

Committed as rev. 53511, 53512 (2.5).

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-21 05:26

Message:
Logged In: YES 
user_id=1591633
Originator: NO

I'd say it looks good.  Now lets see if we can get someone to apply it for
us.  Thanks for the adding the tests!

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-21 00:08

Message:
Logged In: YES 
user_id=261020
Originator: YES

Added tests.
File Added: urllib2_close_socket_v2.patch

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-20 02:44

Message:
Logged In: YES 
user_id=1591633
Originator: NO

Patch looks good to me, and the tests still pass.  If it matters, I would
like to see a test case presented in the patch as well.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 12:52:40 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 03:52:40 -0800
Subject: [Patches] [ python-Patches-1641544 ] rlcompleter tab completion in
	pdb
Message-ID: <E1H8xj6-0000Nw-JO@sc8-sf-web4.sourceforge.net>

Patches item #1641544, was opened at 2007-01-22 11:52
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Stephen Emslie (stephenemslie)
Assigned to: Nobody/Anonymous (nobody)
Summary: rlcompleter tab completion in pdb

Initial Comment:
By default, Pdb and other instances of Cmd complete names for their commands. However in the context of pdb, I think it is more useful to complete identifiers and keywords in its current scope than to complete names of commands (most of which have single letter abbreviations). I believe this makes pdb a far more usable introspection tool.

I have discussed this proposal on the python-ideas list:
http://mail.python.org/pipermail/python-ideas/2007-January/000084.html

This patch implements the following:
  - creates an rlcompleter instance on Pdb if readline is available
  - adds a 'complete' method to the Pdb class. The only difference with rlcompleter's default behaviour is that is also updates rlcompleter's namespace to reflect the current local and global namespace, which is necessary because pdb changes scope as it steps through a program

This is a patch against python/Lib/pdb.py rev. 51745


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 18:00:04 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 09:00:04 -0800
Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers
Message-ID: <E1H92Wa-0006Fo-D1@sc8-sf-web4.sourceforge.net>

Patches item #1641790, was opened at 2007-01-22 17:00
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: TH (therve)
Assigned to: Nobody/Anonymous (nobody)
Summary: logging leaks loggers

Initial Comment:
In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager).

Attached a patch using a weakref object, with a test.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 18:09:18 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 09:09:18 -0800
Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers
Message-ID: <E1H92fW-0001xn-Fc@sc8-sf-monitor2.sourceforge.net>

Patches item #1641790, was opened at 2007-01-22 17:00
Message generated for change (Comment added) made by therve
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: TH (therve)
Assigned to: Nobody/Anonymous (nobody)
Summary: logging leaks loggers

Initial Comment:
In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager).

Attached a patch using a weakref object, with a test.

----------------------------------------------------------------------

>Comment By: TH (therve)
Date: 2007-01-22 17:09

Message:
Logged In: YES 
user_id=1038797
Originator: YES

Looking at the documentation, it seems keeping it is mandatory because you
must get the same instance with getLogger. Maybe it'd need a documented way
to remove from the dict, though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 18:33:06 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 09:33:06 -0800
Subject: [Patches] [ python-Patches-1591665 ] adding __dir__
Message-ID: <E1H932Y-0001ZD-HX@sc8-sf-web2.sourceforge.net>

Patches item #1591665, was opened at 2006-11-06 23:52
Message generated for change (Settings changed) made by gangesmaster
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1591665&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: ganges master (gangesmaster)
>Assigned to: Barry A. Warsaw (bwarsaw)
Summary: adding __dir__

Initial Comment:
in accordance with 
http://mail.python.org/pipermail/python-dev/2006-November/069865.html

i've written a patch that allows objects to define their
own introspection mechanisms, by providing __dir__.
with this patch:
* dir() returns the locals. this is done in builtin_dir()
* dir(obj) returns the attributes of obj, 
by invoking PyObject_Dir()
* if obj->ob_type has "__dir__", it is used. 
note that it must return a list!
* otherwise, use default the mechanism of collecting 
attributes
* for module objects, return __dict__.keys()
* for type objects, return __dict__.keys() +
dir(obj.__base__)
* for all other objects, return __dict__.keys() + 
__members__ + __methods__ + dir(obj.__class__)
* builtin_dir takes care of sorting the list

----------------------------------------------------------------------

Comment By: ganges master (gangesmaster)
Date: 2006-12-19 23:12

Message:
Logged In: YES 
user_id=1406776
Originator: YES

i guess the demo isn't updated/relevant anymore. 
instead, concrete tests were added to lib/tests/test_builtin.py

----------------------------------------------------------------------

Comment By: Armin Rigo (arigo)
Date: 2006-11-23 14:11

Message:
Logged In: YES 
user_id=4771
Originator: NO

Line 20 in demo.py:

    assert "__getitem__" in dir(x)

looks strange to me...  Foo doesn't inherit from
any sequence or mapping type.

----------------------------------------------------------------------

Comment By: ganges master (gangesmaster)
Date: 2006-11-11 23:31

Message:
Logged In: YES 
user_id=1406776

> PyObject_CallFunctionObjArgs(dirfunc, obj, NULL)
done

> Couldn't __dir__ also be allowed to return a tuple?
no, because tuples are not sortable, and i don't want to 
over complicate the c-side code of PyObject_Dir. 
having __dir__ returning only a list is equivalent to 
__repr__ returning only strings.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-11-11 21:58

Message:
Logged In: YES 
user_id=849994

* Instead of doing PyObject_CallFunction(dirfunc, "O", obj)
you should
  do PyObject_CallFunctionObjArgs(dirfunc, obj, NULL).
* Couldn't __dir__ also be allowed to return a tuple?


----------------------------------------------------------------------

Comment By: ganges master (gangesmaster)
Date: 2006-11-08 13:22

Message:
Logged In: YES 
user_id=1406776

i like to init all my locals ("just in case"), but
if the rest of the code does not adhere my style, 
i'll change that.
anyway, i made the changes to the code, updated the docs, 
and  added full tests (the original dir() wasn't test 
so thoroughly)


-tomer

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-11-08 07:53

Message:
Logged In: YES 
user_id=33168

tomer, do you know about configuring with --pydebug?  That
helps track down refleaks when running regrtest -R ::.

object.c: 
 _dir_locals: result is not necessary and locals doesn't
need to be initialized as it's set on the next line.  You
could just declare and set it all on one line.

_specialized_dir_type should be static.  No need to init
dict.  Either don't init result or remove else result =
NULL.  I'd prefer removing the else and leaving the init.

_specialized_dir_module should be static.  No need to init
dict.  Can you get the name of the module and use that in
the error msg: PyModule_GetName()?  That would hopefully
provide a nicer error msg.

_generic_dir: No need to init dict.  

+               /* XXX api change: how about falling to
obj->ob_type 
+                  XXX if no __class__ exists? */

Do you mean falling *back*?  Also, we've been using
XXX(username): as the format for such comments.  So this
would be better as:

 /* XXX(tomer): Perhaps fall back to obj->ob_type if no
__class__ exists? */

_dir_object: No need to init dirfunc.  

PyObject_Dir: No need to init result.

Are there tests for all conditions?  At least:
 * dir()
 * dir(obj)
 * dir(obj_with_no_dict)
 * dir(obj_with_no__class__)
 * dir(obj_with__methods__)
 * dir(obj_with__members__)
 * dir(module)
 * dir(module_with_no__dict__)
 * dir(module_with_invalid__dict__)

There also need to be updates to Doc/lib/libfuncs.tex.  If
you can't deal with the markup, just do the best you can in
text and someone else will fixup the markup.

Thanks for attaching the patch as a single file, it's easier
to deal with.

----------------------------------------------------------------------

Comment By: ganges master (gangesmaster)
Date: 2006-11-07 17:37

Message:
Logged In: YES 
user_id=1406776

okay:
* builtin_dir directly calls PyObject_Dir
* PyObject_Dir handles NULL argument and sorting
* it is now completely compatible with the 2.5 API
* fixed several refcount bugs (i wish we had a tracing gc :)

----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2006-11-07 00:52

Message:
Logged In: YES 
user_id=1038590

The retrieval of locals on a NULL argument and the sorting
step need to move back inside PyObject_Dir to avoid changing
the C API.

If the standard library's current C API tests didn't break
on this version of the patch, then the final version of the
patch should include enhanced tests for PyObject_Dir that
pass both before and after the patch is applied to PyObject_Dir.

Other than that, I didn't see any major problems on reading
the code (i.e. refcounts and error handling looked pretty
reasonable). I haven't actually run it though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1591665&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 20:08:05 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 11:08:05 -0800
Subject: [Patches] [ python-Patches-1587674 ] Patch for #1586414 to avoid
	fragmentation on Windows
Message-ID: <E1H94WS-00021F-TS@sc8-sf-web4.sourceforge.net>

Patches item #1587674, was opened at 2006-10-31 06:05
Message generated for change (Comment added) made by gustaebel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587674&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
>Status: Closed
>Resolution: Rejected
Priority: 5
Private: No
Submitted By: Enoch Julias (enochjul)
Assigned to: Lars Gust?bel (gustaebel)
Summary: Patch for #1586414 to avoid fragmentation on Windows

Initial Comment:
Add a call to file.truncate() to inform Windows of the
size of the target file in  makefile(). This helps
guide cluster allocation in NTFS to avoid fragmentation.

----------------------------------------------------------------------

>Comment By: Lars Gust?bel (gustaebel)
Date: 2007-01-22 20:08

Message:
Logged In: YES 
user_id=642936
Originator: NO

Closed due to lack of interest.

----------------------------------------------------------------------

Comment By: Lars Gust?bel (gustaebel)
Date: 2006-12-23 20:03

Message:
Logged In: YES 
user_id=642936
Originator: NO

Any progress on this one?

----------------------------------------------------------------------

Comment By: Lars Gust?bel (gustaebel)
Date: 2006-11-08 22:30

Message:
Logged In: YES 
user_id=642936

You both still fail to convince me and I still don't see
need for action. The only case ATM where this addition makes
sense (in your opinion) is the Windows OS when using the
NTFS filesystem and certain conditions are met. NTFS has a
preallocation algorithm to deal with this. We don't know if
there is any advantage on FAT filesystems.

On Linux for example there is a plethora of supported
filesystems. Some of them may take advantage, others may
not. Who knows? We can't even detect which filesystem type
we are currently writing to. Apart from that, the behaviour
of truncate(arg) with arg > filesize seems to be
system-dependent.

So, IMO this is a very special optimization targeted at a
single platform. The TarFile class is easily subclassable,
just override the makefile() method and add the two lines of
code. I think that's what ActiveState's Python Cookbook is for.

BTW, I like my files to grow bit by bit. In case of an
error, I can detect if a file was not extracted completely
by comparing the file sizes. Furthermore, a file that grows
is more common and more what a programmer who uses this
module might expect.


----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2006-11-08 17:33

Message:
Logged In: YES 
user_id=341410

I disagree with user gustaebel.  We should be adding
automatic truncate calls for all possible supported
platforms, in all places where it could make sense.  Be it
in tarfile, zipfile, where ever we can.  It would make sense
to write a function that can be called by all of those
modules so that there is only one place to update if/when
changes occur.  If the function were not part of the public
Python API, then it wouldn't need to wait until 2.6, unless
it were considered a feature addition rather than bugfix. 
One would have to wait on a response from Martin or Anthony
to know which it was, though I couldn't say for sure if
operations that are generally performance enhancing are
bugfixes or feature additions.

----------------------------------------------------------------------

Comment By: Lars Gust?bel (gustaebel)
Date: 2006-11-06 22:57

Message:
Logged In: YES 
user_id=642936

Personally, I think disk defragmenters are evil ;-) They
create the need that they are supposed to satisfy at the
same time. On Linux we have no defragmenters, so we don't
bother about it.

I think your proposal is some kind of a performance hack for
a particular filesystem. In principle, this problem exists
for all filesystems on all platforms. Fragmentation is IMO a
filesystem's problem and is not so much a state but more
like a process. Filesystem fragment over time and you can't
do anything about it. For those people who care, disk
fragmenter were invented. It is not tarfile.py's job to care
about a fragmented filesystem, that's simply too low level.

I admit that it is a small patch, but I'm -1 on having this
applied.

----------------------------------------------------------------------

Comment By: Enoch Julias (enochjul)
Date: 2006-11-06 18:19

Message:
Logged In: YES 
user_id=6071

I have not really tested FAT/FAT32 yet as I don't use these 
filesystems now.

The Disk Defragmenter tool in Windows 2000/XP shows the number of 
files/directories fragmented in its report.

NTFS does handle growing files, but the operating system can only do 
so much without knowing the size of the file. Extracting from 
archives consisting of only several files does not cause 
fragmentation. However, if the archive has many files, it is much 
more likely that the default algorithm will fail to allocate 
contiguous clusters for some files. It may also depend on the amount 
of free space fragmentation on a particular partition and whether 
other processes are writing to other files in the same partition.

Some details of the cluster allocation algorithm used in Windows can 
be found at http://support.microsoft.com/kb/841551.

----------------------------------------------------------------------

Comment By: Lars Gust?bel (gustaebel)
Date: 2006-11-01 16:27

Message:
Logged In: YES 
user_id=642936

Is this merely an NTFS problem or is it the same with FAT fs?
How do you detect file fragmentation?
Doesn't this problem apply to all other modules or scripts
that write to file objects as well?
Shouldn't a decent filesystem be able to handle growing
files in a correct manner?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587674&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 20:40:44 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 11:40:44 -0800
Subject: [Patches] [ python-Patches-1637157 ] urllib: change email.Utils ->
	email.utils
Message-ID: <E1H9524-0005RF-4h@sc8-sf-web2.sourceforge.net>

Patches item #1637157, was opened at 2007-01-16 22:08
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Russell Owen (reowen)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib: change email.Utils -> email.utils

Initial Comment:
urllib uses the old name email.Utils instead of the new name email.Utils. This confuses py2app and possibly other packagers.

Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right).

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-22 19:40

Message:
Logged In: YES 
user_id=849994
Originator: NO

Fixed in trunk.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 20:41:02 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 11:41:02 -0800
Subject: [Patches] [ python-Patches-1637159 ] urllib2:
	email.Utils->email.utils
Message-ID: <E1H952M-0005Ra-KA@sc8-sf-web2.sourceforge.net>

Patches item #1637159, was opened at 2007-01-16 22:09
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Russell Owen (reowen)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib2: email.Utils->email.utils

Initial Comment:
urllib2 uses the old name email.Utils instead of the new name email.Utils. This may confuse py2app and/or other packagers.

Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right).

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-22 19:41

Message:
Logged In: YES 
user_id=849994
Originator: NO

Fixed in trunk.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 20:41:20 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 11:41:20 -0800
Subject: [Patches] [ python-Patches-1637162 ] smtplib email renames
Message-ID: <E1H952e-0005X0-D9@sc8-sf-web2.sourceforge.net>

Patches item #1637162, was opened at 2007-01-16 22:11
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: None
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Russell Owen (reowen)
Assigned to: Nobody/Anonymous (nobody)
Summary: smtplib email renames

Initial Comment:
smtplib uses the old names email.Utils and email.base64MIME instead of the new email.utils and email.base64mime. This may confuse py2app and/or other packagers.

Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right).

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-22 19:41

Message:
Logged In: YES 
user_id=849994
Originator: NO

Fixed in trunk.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470

From noreply at sourceforge.net  Mon Jan 22 20:44:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 11:44:30 -0800
Subject: [Patches] [ python-Patches-1635639 ] ConfigParser does not quote %
Message-ID: <E1H955i-0005ie-3G@sc8-sf-web4.sourceforge.net>

Patches item #1635639, was opened at 2007-01-15 02:43
Message generated for change (Comment added) made by akuchling
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635639&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
>Category: None
>Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mark Roberts (mark-roberts)
Assigned to: Nobody/Anonymous (nobody)
Summary: ConfigParser does not quote %

Initial Comment:
This is covered by bug 1603688 (https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1603688&group_id=5470)

I implemented 2 versions of this patch.  One version raises ValueError when an invalid interpolation syntax is encountered (such as foo%, foo%bar, and %foo, but not %%foo and %(dir)foo).

The other version simply replaces appropriate %s with %%s.

Initially, I believed ValueError was the appropriate way to go with this.  However, when I thought about how I use ConfigParser, I realized that it would be far nicer if it simply worked.

I'm +0.5 to ValueError, and +1 to munging the values.

----------------------------------------------------------------------

>Comment By: A.M. Kuchling (akuchling)
Date: 2007-01-22 14:44

Message:
Logged In: YES 
user_id=11375
Originator: NO

Turning into a patch.

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-15 21:17

Message:
Logged In: YES 
user_id=1591633
Originator: YES

For the record, this was supposed to be a patch.  I don't know if the
admins have any way of moving it to that category.  I guess that explained
the funky categories and groups.  Sorry for the inconvenience.

----------------------------------------------------------------------

Comment By: Mark Roberts (mark-roberts)
Date: 2007-01-15 02:44

Message:
Logged In: YES 
user_id=1591633
Originator: YES

File Added: bug_1603688_cfgparser_munges.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635639&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 05:47:12 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 22 Jan 2007 20:47:12 -0800
Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers
Message-ID: <E1H9DYu-0001ho-Cc@sc8-sf-web11.sourceforge.net>

Patches item #1641790, was opened at 2007-01-22 09:00
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: TH (therve)
>Assigned to: Vinay Sajip (vsajip)
Summary: logging leaks loggers

Initial Comment:
In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager).

Attached a patch using a weakref object, with a test.

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-22 20:47

Message:
Logged In: YES 
user_id=33168
Originator: NO

Vinay, can you provide some direction?  Thanks.

----------------------------------------------------------------------

Comment By: TH (therve)
Date: 2007-01-22 09:09

Message:
Logged In: YES 
user_id=1038797
Originator: YES

Looking at the documentation, it seems keeping it is mandatory because you
must get the same instance with getLogger. Maybe it'd need a documented way
to remove from the dict, though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

From snash at smartsource-inc.com  Mon Jan 22 23:52:42 2007
From: snash at smartsource-inc.com (Scott Nash)
Date: Mon, 22 Jan 2007 16:52:42 -0600
Subject: [Patches] =?windows-1252?q?Can_you_point_me_in_the_right_directio?=
	=?windows-1252?q?n_for_/_IT_Staff_Augmentation_contract=2C_contrac?=
	=?windows-1252?q?t_to_hire=2C_perm_placement_or_project_basis=2E?=
Message-ID: <c382b492438cf87d6098ddf3001683b9@smartsource-inc.com>

An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/patches/attachments/20070122/04b9a8d4/attachment.htm 

From noreply at sourceforge.net  Tue Jan 23 09:42:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 00:42:30 -0800
Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers
Message-ID: <E1H9HEc-00008i-DK@sc8-sf-monitor2.sourceforge.net>

Patches item #1641790, was opened at 2007-01-22 17:00
Message generated for change (Comment added) made by vsajip
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
>Status: Closed
>Resolution: Invalid
Priority: 5
Private: No
Submitted By: TH (therve)
Assigned to: Vinay Sajip (vsajip)
Summary: logging leaks loggers

Initial Comment:
In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager).

Attached a patch using a weakref object, with a test.

----------------------------------------------------------------------

>Comment By: Vinay Sajip (vsajip)
Date: 2007-01-23 08:42

Message:
Logged In: YES 
user_id=308438
Originator: NO

This is not a leak - it's by design. You are not using best practice when
you create a logger per client; the specific scenario of getting
connection info in the logging message can currently be done in several
ways, e.g.

1. Use the 'extra' parameter (added in Python 2.5).
2. Use a connection-specific factory to obtain the logging message, or
wrap the logging call on a connection-specific object which inserts the
connection info.
3. Use something other than a literal string for the message - as
documented, any object can be used as the message, and the logging system
calls str() on it to get the actual text of the message. The "something"
can be an instance of a class which Does The Right Thing.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-23 04:47

Message:
Logged In: YES 
user_id=33168
Originator: NO

Vinay, can you provide some direction?  Thanks.

----------------------------------------------------------------------

Comment By: TH (therve)
Date: 2007-01-22 17:09

Message:
Logged In: YES 
user_id=1038797
Originator: YES

Looking at the documentation, it seems keeping it is mandatory because you
must get the same instance with getLogger. Maybe it'd need a documented way
to remove from the dict, though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 09:54:54 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 00:54:54 -0800
Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers
Message-ID: <E1H9HQc-00007s-LI@sc8-sf-web4.sourceforge.net>

Patches item #1641790, was opened at 2007-01-22 17:00
Message generated for change (Comment added) made by therve
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Closed
Resolution: Invalid
Priority: 5
Private: No
Submitted By: TH (therve)
Assigned to: Vinay Sajip (vsajip)
Summary: logging leaks loggers

Initial Comment:
In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager).

Attached a patch using a weakref object, with a test.

----------------------------------------------------------------------

>Comment By: TH (therve)
Date: 2007-01-23 08:54

Message:
Logged In: YES 
user_id=1038797
Originator: YES

OK I understand the design. But it's not clear in the documentation that
once you've called getLogger('id') the logger will live forever. It's
especially problematic on long-running processes.

It would be great to have at least a warning in the documentation about
this feature.

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-23 08:42

Message:
Logged In: YES 
user_id=308438
Originator: NO

This is not a leak - it's by design. You are not using best practice when
you create a logger per client; the specific scenario of getting connection
info in the logging message can currently be done in several ways, e.g.

1. Use the 'extra' parameter (added in Python 2.5).
2. Use a connection-specific factory to obtain the logging message, or
wrap the logging call on a connection-specific object which inserts the
connection info.
3. Use something other than a literal string for the message - as
documented, any object can be used as the message, and the logging system
calls str() on it to get the actual text of the message. The "something"
can be an instance of a class which Does The Right Thing.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-23 04:47

Message:
Logged In: YES 
user_id=33168
Originator: NO

Vinay, can you provide some direction?  Thanks.

----------------------------------------------------------------------

Comment By: TH (therve)
Date: 2007-01-22 17:09

Message:
Logged In: YES 
user_id=1038797
Originator: YES

Looking at the documentation, it seems keeping it is mandatory because you
must get the same instance with getLogger. Maybe it'd need a documented way
to remove from the dict, though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 12:18:39 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 03:18:39 -0800
Subject: [Patches] [ python-Patches-1507247 ] tarfile extraction does not
	honor umask
Message-ID: <E1H9Jfj-00006w-PC@sc8-sf-web6.sourceforge.net>

Patches item #1507247, was opened at 2006-06-16 14:11
Message generated for change (Comment added) made by gustaebel
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1507247&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Faik Uygur (faik)
Assigned to: Lars Gust?bel (gustaebel)
Summary: tarfile extraction does not honor umask

Initial Comment:
If the upperdirs in the member file's pathname does not 
exist. tarfile creates those paths with 0777 permission 
bits and does not honor umask.

This patch uses umask to set the ti.mode of the created 
directory for later usage in chmod.

--- tarfile.py  (revision 46993)
+++ tarfile.py  (working copy)
@@ -1560,7 +1560,9 @@
             ti = TarInfo()
             ti.name  = upperdirs
             ti.type  = DIRTYPE
-            ti.mode  = 0777
+            umask = os.umask(0)
+            ti.mode  = 0777 - umask
+            os.umask(umask)
             ti.mtime = tarinfo.mtime
             ti.uid   = tarinfo.uid
             ti.gid   = tarinfo.gid


----------------------------------------------------------------------

>Comment By: Lars Gust?bel (gustaebel)
Date: 2007-01-23 12:18

Message:
Logged In: YES 
user_id=642936
Originator: NO

Committed my patch as rev. 53526.

----------------------------------------------------------------------

Comment By: Lars Gust?bel (gustaebel)
Date: 2006-12-31 12:52

Message:
Logged In: YES 
user_id=642936
Originator: NO

I've come to the conclusion that it is a doubtful approach to take the
mtime and ownership from the file and use it on the upper directories as
well. So, I've come up with a totally different solution (cp.
makedirs.diff) that abandons the use of os.umask() completely and uses a
single call to os.makedirs() to create the missing directories.
It seems very attractive to me to do it this way, what do you think?
File Added: makedirs.diff

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-30 19:25

Message:
Logged In: YES 
user_id=161998
Originator: NO

umask(2) works in the same way, so there seems to be no unixy way to
inspect umask without setting it.
I think the solution would be to make a C-level function to return the
umask (by setting and resetting it). 
As the interpreter itself is single threaded, this is race-free.


----------------------------------------------------------------------

Comment By: Lars Gust?bel (gustaebel)
Date: 2006-12-30 13:11

Message:
Logged In: YES 
user_id=642936
Originator: NO

In order to determine the current umask we have no other choice AFAIK than
to set it with a bogus value, save the return value and restore it right
away - as you proposed in your patch. The problem is that there is a small
window of time between these two calls where the umask is invalid. This is
especially bad in multi-threaded environments.

Any ideas?

----------------------------------------------------------------------

Comment By: Han-Wen Nienhuys (hanwen)
Date: 2006-12-06 00:40

Message:
Logged In: YES 
user_id=161998
Originator: NO

Hi, 

I can reproduce this problem on python 2.4 , and patch applies to python
2.5 too. Fix looks good to me.


----------------------------------------------------------------------

Comment By: Faik Uygur (faik)
Date: 2006-08-18 11:44

Message:
Logged In: YES 
user_id=1541018

Above patch is wrong. The correct one is attached.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1507247&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 15:02:33 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 06:02:33 -0800
Subject: [Patches] [ python-Patches-1642547 ] Fix error/crash in AST:
	syntaxerror in complex ifs
Message-ID: <E1H9MEL-0004yJ-BM@sc8-sf-web11.sourceforge.net>

Patches item #1642547, was opened at 2007-01-23 15:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
Status: Open
Resolution: None
Priority: 9
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Neal Norwitz (nnorwitz)
Summary: Fix error/crash in AST: syntaxerror in complex ifs

Initial Comment:
Fix a bug in Python/ast.c, where a particular syntaxerror in an 'if' with one or more 'elif's would be ignored or mishandled:

timberwolf:~/python/python/trunk > cat test2.py
def bug():
    if w:
        dir()=1
    elif v:
        pass

timberwolf:~/python/python/trunk > python2.4 test2.py
  File "test2.py", line 3
    dir()=1
SyntaxError: can't assign to function call

timberwolf:~/python/python/trunk > python2.5 test2.py
Exception exceptions.SyntaxError: ("can't assign to function call", 3) in 'garbage collection' ignored
Fatal Python error: unexpected exception during garbage collection
Aborted

The actual problem is the lack of error checks on the return values of ast_for_expr() and ast_for_suite, in ast_for_if_stmt. Attached patch fixes.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642547&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 15:02:53 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 06:02:53 -0800
Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing
	sys.stdout in sitecustomize
Message-ID: <E1H9MEf-0006wG-KP@sc8-sf-web4.sourceforge.net>

Patches item #1630975, was opened at 2007-01-08 23:55
Message generated for change (Settings changed) made by twouters
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Fixed
Priority: 9
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Thomas Wouters (twouters)
Summary: Fix crash when replacing sys.stdout in sitecustomize

Initial Comment:
When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects.

Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes.

No idea how to test for this, from a unittest :P


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 07:56

Message:
Logged In: YES 
user_id=33168
Originator: NO

Forgot to mention that I agree about the warning.  If no one noticed so
far, this is such an obscure case, it's not that important to warn.  Either
way is fine with me.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 07:55

Message:
Logged In: YES 
user_id=33168
Originator: NO

I can think of a nasty way to test this, but it's not really worth it. 
You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH and
spawning a python.  Ok, so it's not a real unit test, but it is a test.
:-)

This looks like it will also crash (before and after the patch) if
sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). 
sysmodule.c looks fine.

I think this is fine for 2.5.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 15:04:27 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 06:04:27 -0800
Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing
	sys.stdout in sitecustomize
Message-ID: <E1H9MGB-0007cp-9W@sc8-sf-web4.sourceforge.net>

Patches item #1630975, was opened at 2007-01-08 23:55
Message generated for change (Comment added) made by twouters
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: None
Status: Closed
Resolution: Fixed
Priority: 9
Private: No
Submitted By: Thomas Wouters (twouters)
Assigned to: Thomas Wouters (twouters)
Summary: Fix crash when replacing sys.stdout in sitecustomize

Initial Comment:
When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects.

Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes.

No idea how to test for this, from a unittest :P


----------------------------------------------------------------------

>Comment By: Thomas Wouters (twouters)
Date: 2007-01-23 15:04

Message:
Logged In: YES 
user_id=34209
Originator: YES

Oh, for the record: I was unable to produce a crash by *deleting*
sys.stdin/stdout/stderr (although it produced funny results. In particular
when I adding a 'print' statement after the deletes, in my
sitecustomize.py, to make sure it was getting run. Of course, the print
never arrived ;-P)


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 07:56

Message:
Logged In: YES 
user_id=33168
Originator: NO

Forgot to mention that I agree about the warning.  If no one noticed so
far, this is such an obscure case, it's not that important to warn.  Either
way is fine with me.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 07:55

Message:
Logged In: YES 
user_id=33168
Originator: NO

I can think of a nasty way to test this, but it's not really worth it. 
You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH and
spawning a python.  Ok, so it's not a real unit test, but it is a test.
:-)

This looks like it will also crash (before and after the patch) if
sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). 
sysmodule.c looks fine.

I think this is fine for 2.5.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 18:06:59 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 09:06:59 -0800
Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers
Message-ID: <E1H9P6p-0006mH-97@sc8-sf-web4.sourceforge.net>

Patches item #1641790, was opened at 2007-01-22 18:00
Message generated for change (Comment added) made by pitrou
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Closed
Resolution: Invalid
Priority: 5
Private: No
Submitted By: TH (therve)
Assigned to: Vinay Sajip (vsajip)
Summary: logging leaks loggers

Initial Comment:
In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager).

Attached a patch using a weakref object, with a test.

----------------------------------------------------------------------

Comment By: Antoine Pitrou (pitrou)
Date: 2007-01-23 18:06

Message:
Logged In: YES 
user_id=133955
Originator: NO

Ok, since I was the one bitten by this bug I might as well add my 2 cents
to the discussion.

vsajip:
> 1. Use the 'extra' parameter (added in Python 2.5).
This is not practical. I want to define a prefix once and for all for all
log messages that will be output in a given context. Explicitly adding a
parameter to every log call does not help.
(of course I can write wrappers to do this automatically - and that's what
I ended up doing -, but then I must write 6 of them: one for each of
"debug", "info", "warning", "error", "critical", and "exception"...)

> 2. Use a connection-specific factory to obtain the logging message, or
wrap the logging call on a connection-specific object which inserts the
connection info.

I don't even know what this means, but it sounds way overkill...

> 3. Use something other than a literal string for the message - as
documented, any object can be used as the message, and the logging system
calls str() on it to get the actual text of the message. The "something"
can be an instance of a class which Does The Right Thing.

IIUC this means some explicitly machinery on each logging call, since I
have to wrap every string in a constructor. Just like the "extra"
parameter, with a slightly different flavour.

It's disturbing that the logging module has so many powerful options but
no way of conveniently doing simple things without creating memory
leaks...


----------------------------------------------------------------------

Comment By: TH (therve)
Date: 2007-01-23 09:54

Message:
Logged In: YES 
user_id=1038797
Originator: YES

OK I understand the design. But it's not clear in the documentation that
once you've called getLogger('id') the logger will live forever. It's
especially problematic on long-running processes.

It would be great to have at least a warning in the documentation about
this feature.

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-23 09:42

Message:
Logged In: YES 
user_id=308438
Originator: NO

This is not a leak - it's by design. You are not using best practice when
you create a logger per client; the specific scenario of getting connection
info in the logging message can currently be done in several ways, e.g.

1. Use the 'extra' parameter (added in Python 2.5).
2. Use a connection-specific factory to obtain the logging message, or
wrap the logging call on a connection-specific object which inserts the
connection info.
3. Use something other than a literal string for the message - as
documented, any object can be used as the message, and the logging system
calls str() on it to get the actual text of the message. The "something"
can be an instance of a class which Does The Right Thing.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-23 05:47

Message:
Logged In: YES 
user_id=33168
Originator: NO

Vinay, can you provide some direction?  Thanks.

----------------------------------------------------------------------

Comment By: TH (therve)
Date: 2007-01-22 18:09

Message:
Logged In: YES 
user_id=1038797
Originator: YES

Looking at the documentation, it seems keeping it is mandatory because you
must get the same instance with getLogger. Maybe it'd need a documented way
to remove from the dict, though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

From noreply at sourceforge.net  Tue Jan 23 20:34:28 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 23 Jan 2007 11:34:28 -0800
Subject: [Patches] [ python-Patches-1642844 ] comments to clarify
	complexobject.c
Message-ID: <E1H9RPY-0008T6-Ty@sc8-sf-web1.sourceforge.net>

Patches item #1642844, was opened at 2007-01-23 14:34
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
Assigned to: Nobody/Anonymous (nobody)
Summary: comments to clarify complexobject.c

Initial Comment:
The constructor for a complex takes two values, representing the real and imaginary parts.  Obviously, these should normally both be real numbers, but they don't have to be.  The code to cater to complex arguments led to even Tim Peters asking WTF?

http://mail.python.org/pipermail/python-dev/2007-January/070732.html

This patch just adds comments.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470

From noreply at sourceforge.net  Wed Jan 24 16:50:29 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 24 Jan 2007 07:50:29 -0800
Subject: [Patches] [ python-Patches-1643641 ] Fix Bug 1362475
	Text.edit_modified() doesn't work
Message-ID: <E1H9kOL-0004QZ-4k@sc8-sf-web3.sourceforge.net>

Patches item #1643641, was opened at 2007-01-24 15:50
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643641&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tkinter
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Matthias Kievernagel (mkiever)
Assigned to: Martin v. L?wis (loewis)
Summary: Fix Bug 1362475 Text.edit_modified() doesn't work

Initial Comment:
Text.edit_modified() called _getints()
for boolean return values causing
an exception.
The patch below removes _getints call.
The other Text.edit_*() functions have
no return values so they still work
after applying the patch.

Greetings,
Matthias Kievernagel

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643641&group_id=5470

From noreply at sourceforge.net  Wed Jan 24 22:12:11 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 24 Jan 2007 13:12:11 -0800
Subject: [Patches] [ python-Patches-1643874 ] ctypes leaks memory
Message-ID: <E1H9pPf-0003Ve-LW@sc8-sf-web2.sourceforge.net>

Patches item #1643874, was opened at 2007-01-24 22:12
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Thomas Heller (theller)
Assigned to: Thomas Heller (theller)
Summary: ctypes leaks memory

Initial Comment:
This program leaks memory, because a string is allocated with the win32 call SysAllocString(), but SysFreeString() is never called.

"""
from ctypes import oledll, _SimpleCData

class BSTR(_SimpleCData):
    _type_ = "X"

func = oledll.oleaut32.SysStringLen
func.argtypes = (BSTR,)

while 1:
    func("abcdefghijk")

"""

The attached patch fixes this.

(The BSTR data type is not exposed by ctypes or ctypes.wintypes, because it is only used in connection with Windows COM objects.)

The patch should be applied to release25-maint and trunk.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 10:00:35 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 01:00:35 -0800
Subject: [Patches] [ python-Patches-1638879 ] Fix to the long("123\0",
	10) problem
Message-ID: <E1HA0TD-0007jE-3Q@sc8-sf-web7.sourceforge.net>

Patches item #1638879, was opened at 2007-01-18 20:03
Message generated for change (Comment added) made by lhorn
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Calvin Spealman (ironfroggy)
Assigned to: Nobody/Anonymous (nobody)
Summary: Fix to the long("123\0", 10) problem

Initial Comment:
This is a simple patch adapted from the int_new function to the long_new function.

----------------------------------------------------------------------

Comment By: Lutz Horn (lhorn)
Date: 2007-01-25 10:00

Message:
Logged In: YES 
user_id=96760
Originator: NO

This patch compiles and passes all tests against revisions 53406 and 53549
on Ubuntu 6.06.1.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 10:38:26 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 01:38:26 -0800
Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib
Message-ID: <E1HA13q-00013h-Oj@sc8-sf-web5.sourceforge.net>

Patches item #1644218, was opened at 2007-01-25 10:38
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Daniel Nogradi (nogradi)
Assigned to: Nobody/Anonymous (nobody)
Summary: file -> open in stdlib

Initial Comment:
AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 18:57:20 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 09:57:20 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HA8qe-00019S-78@sc8-sf-web1.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 15:13
Message generated for change (Comment added) made by gustavo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 17:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 22:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 16:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 03:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 14:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 00:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 19:11:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 10:11:30 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HA94M-0007nJ-SG@sc8-sf-web11.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 15:13
Message generated for change (Comment added) made by gustavo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 17:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 22:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 16:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 03:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 14:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 00:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 19:14:37 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 10:14:37 -0800
Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib
Message-ID: <E1HA97N-0008R6-6h@sc8-sf-web4.sourceforge.net>

Patches item #1644218, was opened at 2007-01-25 09:38
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Daniel Nogradi (nogradi)
Assigned to: Nobody/Anonymous (nobody)
Summary: file -> open in stdlib

Initial Comment:
AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed.

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-25 18:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

I think we should do this at least in the Py3k branch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 19:38:40 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 10:38:40 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HA9Ue-00020R-0H@sc8-sf-web11.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 08:13
Message generated for change (Comment added) made by rhamphoryncus
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 11:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 11:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 10:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 15:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 09:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 20:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 08:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 07:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-26 17:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 20:22:19 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 11:22:19 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HAAAt-0004ac-FT@sc8-sf-web8.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 10:13
Message generated for change (Comment added) made by kuran
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 14:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 13:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 13:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 12:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 17:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 11:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 22:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 10:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 09:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-26 19:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 20:24:41 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 11:24:41 -0800
Subject: [Patches] [ python-Patches-1643874 ] ctypes leaks memory
Message-ID: <E1HAADB-0002IL-4o@sc8-sf-web9.sourceforge.net>

Patches item #1643874, was opened at 2007-01-24 22:12
Message generated for change (Comment added) made by theller
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Thomas Heller (theller)
Assigned to: Thomas Heller (theller)
Summary: ctypes leaks memory

Initial Comment:
This program leaks memory, because a string is allocated with the win32 call SysAllocString(), but SysFreeString() is never called.

"""
from ctypes import oledll, _SimpleCData

class BSTR(_SimpleCData):
    _type_ = "X"

func = oledll.oleaut32.SysStringLen
func.argtypes = (BSTR,)

while 1:
    func("abcdefghijk")

"""

The attached patch fixes this.

(The BSTR data type is not exposed by ctypes or ctypes.wintypes, because it is only used in connection with Windows COM objects.)

The patch should be applied to release25-maint and trunk.

----------------------------------------------------------------------

>Comment By: Thomas Heller (theller)
Date: 2007-01-25 20:24

Message:
Logged In: YES 
user_id=11105
Originator: YES

Fixed in r53556, r53557 (trunk) and r53558 (release25-maint).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 23:12:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 14:12:30 -0800
Subject: [Patches] [ python-Patches-1644818 ] Allow importing built-in
	submodules
Message-ID: <E1HACpa-0000B8-HN@sc8-sf-web8.sourceforge.net>

Patches item #1644818, was opened at 2007-01-25 22:12
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644818&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Miguel Lobo (mlobo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Allow importing built-in submodules

Initial Comment:
At the moment importing built-in submodules (in my case PyQt4.QtCore and PyQt4.QtGui) does not work.  This seems to be because find_module in import.c checks only the module name (e.g. QtCore) against the built-in list, which should contain the full name (e.g. Python.QtCore) instead.

Also, the above check is performed after the code to check if the parent module is frozen, which would have already exited in that case.

By moving the is_builtin() check to earlier in find_module and using fullname instead of name, I can build PyQt4.QtCore and PyQt4.QtGui into the interpreter and import and use them with no problem whatsoever, even if their parent module (PyQt4) is frozen.

I have run the regression tests and everything seems Ok.

I am completely new to CPython development so it is quite possible that my solution is undesirable or that I have done something incorrectly.  Please let me know if that is the case.

Finally, the attached patch is for Python-2.5, but I have checked it also applies to current svn trunk with only a one-line offset.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644818&group_id=5470

From noreply at sourceforge.net  Thu Jan 25 23:23:02 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Thu, 25 Jan 2007 14:23:02 -0800
Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib
Message-ID: <E1HACzm-0003YK-56@sc8-sf-web7.sourceforge.net>

Patches item #1644218, was opened at 2007-01-25 10:38
Message generated for change (Comment added) made by nogradi
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Daniel Nogradi (nogradi)
Assigned to: Nobody/Anonymous (nobody)
Summary: file -> open in stdlib

Initial Comment:
AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed.

----------------------------------------------------------------------

>Comment By: Daniel Nogradi (nogradi)
Date: 2007-01-25 23:23

Message:
Logged In: YES 
user_id=1438337
Originator: YES

Sounds good :)

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-01-25 19:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

I think we should do this at least in the Py3k branch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

From noreply at sourceforge.net  Fri Jan 26 19:58:25 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Fri, 26 Jan 2007 10:58:25 -0800
Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib
Message-ID: <E1HAWHJ-0004oe-8A@sc8-sf-web7.sourceforge.net>

Patches item #1644218, was opened at 2007-01-25 10:38
Message generated for change (Comment added) made by nogradi
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Daniel Nogradi (nogradi)
Assigned to: Nobody/Anonymous (nobody)
Summary: file -> open in stdlib

Initial Comment:
AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed.

----------------------------------------------------------------------

>Comment By: Daniel Nogradi (nogradi)
Date: 2007-01-26 19:58

Message:
Logged In: YES 
user_id=1438337
Originator: YES

I've just checked and in the py3k branch this has already been done.

----------------------------------------------------------------------

Comment By: Daniel Nogradi (nogradi)
Date: 2007-01-25 23:23

Message:
Logged In: YES 
user_id=1438337
Originator: YES

Sounds good :)

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-01-25 19:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

I think we should do this at least in the Py3k branch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

From noreply at sourceforge.net  Sat Jan 27 09:48:33 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 27 Jan 2007 00:48:33 -0800
Subject: [Patches] [ python-Patches-1642844 ] comments to clarify
	complexobject.c
Message-ID: <E1HAjEf-00038N-8R@sc8-sf-web5.sourceforge.net>

Patches item #1642844, was opened at 2007-01-23 19:34
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jim Jewett (jimjjewett)
>Assigned to: Tim Peters (tim_one)
Summary: comments to clarify complexobject.c

Initial Comment:
The constructor for a complex takes two values, representing the real and imaginary parts.  Obviously, these should normally both be real numbers, but they don't have to be.  The code to cater to complex arguments led to even Tim Peters asking WTF?

http://mail.python.org/pipermail/python-dev/2007-January/070732.html

This patch just adds comments.

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-27 08:48

Message:
Logged In: YES 
user_id=849994
Originator: NO

Let Tim decide whether these are useful.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470

From noreply at sourceforge.net  Sat Jan 27 09:52:29 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 27 Jan 2007 00:52:29 -0800
Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib
Message-ID: <E1HAjIT-0003WB-BD@sc8-sf-web5.sourceforge.net>

Patches item #1644218, was opened at 2007-01-25 09:38
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
>Status: Closed
>Resolution: Postponed
Priority: 5
Private: No
Submitted By: Daniel Nogradi (nogradi)
Assigned to: Nobody/Anonymous (nobody)
Summary: file -> open in stdlib

Initial Comment:
AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed.

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-27 08:52

Message:
Logged In: YES 
user_id=849994
Originator: NO

Then, given that it's only four occurences, I think we needn't bother with
the 2.x line.

----------------------------------------------------------------------

Comment By: Daniel Nogradi (nogradi)
Date: 2007-01-26 18:58

Message:
Logged In: YES 
user_id=1438337
Originator: YES

I've just checked and in the py3k branch this has already been done.

----------------------------------------------------------------------

Comment By: Daniel Nogradi (nogradi)
Date: 2007-01-25 22:23

Message:
Logged In: YES 
user_id=1438337
Originator: YES

Sounds good :)

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-01-25 18:14

Message:
Logged In: YES 
user_id=849994
Originator: NO

I think we should do this at least in the Py3k branch.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470

From noreply at sourceforge.net  Sat Jan 27 18:43:16 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 27 Jan 2007 09:43:16 -0800
Subject: [Patches] [ python-Patches-1638243 ] compiler.pycodegen causes
	crashes when compiling 'with'
Message-ID: <E1HAra8-0007d2-Hu@sc8-sf-web11.sourceforge.net>

Patches item #1638243, was opened at 2007-01-18 03:52
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Parser/Compiler
Group: Python 2.5
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: kirat (kirat)
>Assigned to: Georg Brandl (gbrandl)
Summary: compiler.pycodegen causes crashes when compiling 'with'

Initial Comment:
The compiler package in the python library is missing a LOAD/DELETE just before the WITH_CLEANUP instruction.

Also transformer isn't creating the with_var as an assignment.

So the following little code snippet will crash if you compile and run it with compiler.compile()

class TrivialContext:
    def __enter__(self): return self
    def __exit__(self,*exc_info): pass

def f():
    with TrivialContext() as tc:
        return 1
f()

The fix is just a few lines. I'm enclosing a patch against the python 2.5 source.

I've also added the above as a test case to the test_compiler.py file.

regards,
-Kirat


----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-27 17:43

Message:
Logged In: YES 
user_id=849994
Originator: NO

Thanks for the patch, this is fixed now in rev. 53575, 53576 (2.5).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470

From noreply at sourceforge.net  Sat Jan 27 19:00:25 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 27 Jan 2007 10:00:25 -0800
Subject: [Patches] [ python-Patches-1634778 ] Add aliases for latin7/9/10
	charsets
Message-ID: <E1HArqj-0005aE-Qu@sc8-sf-web9.sourceforge.net>

Patches item #1634778, was opened at 2007-01-13 17:39
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
>Status: Closed
>Resolution: Accepted
Priority: 5
Private: No
Submitted By: Christoph Zwerschke (cito)
>Assigned to: Georg Brandl (gbrandl)
Summary: Add aliases for latin7/9/10 charsets

Initial Comment:
This patch adds the latin-7, latin-9 and latin-10 aliases in some places where they were missing (see http://mail.python.org/pipermail/python-list/2006-December/416921.html).

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-27 18:00

Message:
Logged In: YES 
user_id=849994
Originator: NO

Committed in rev. 53578.

I don't think this is backportable, since it adds a new "feature" --
referring to iso8859-15 by "latin9" for example.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470

From noreply at sourceforge.net  Sat Jan 27 20:10:02 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 27 Jan 2007 11:10:02 -0800
Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers
Message-ID: <E1HAsw6-0003GZ-Hs@sc8-sf-web2.sourceforge.net>

Patches item #1641790, was opened at 2007-01-22 09:00
Message generated for change (Comment added) made by josiahcarlson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Closed
Resolution: Invalid
Priority: 5
Private: No
Submitted By: TH (therve)
Assigned to: Vinay Sajip (vsajip)
Summary: logging leaks loggers

Initial Comment:
In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager).

Attached a patch using a weakref object, with a test.

----------------------------------------------------------------------

Comment By: Josiah Carlson (josiahcarlson)
Date: 2007-01-27 11:10

Message:
Logged In: YES 
user_id=341410
Originator: NO

pitrou: you aren't understanding vsajip .  Either factories or custom
classes are *trivial* to write to "do the right thing".

If I understand what you are doing, you have been doing...

class connection:
    def __init__(self, ...):
        self.logger = logging.getLogger(<socket info>)
    def foo(self, ...):
        self.logger.info(message) #or equivalent debug, warning, etc.

If you define the following class:

class loggingwrapper(object):
    __slots__ = ['socketinfo']
    def __init__(self, socketinfo):
        self.socketinfo = str(socketinfo)
    def __getattr__(self, attr):
        fcn = getattr(logger.getLogger(''), attr)
        def f2(msg, *args, **kwargs):
            return fcn("%s %s"%(self.socketinfo, str(msg)), *args,
**kwargs)
        return f2

You can then do *almost* the exact same thing you were doing before...

class connection:
    def __init__(self, ...):
        self.logger = loggingwrapper(<socket info>) #note the change
    def foo(self, ...):
        self.logger.info(message)

And it will work as you want.

----------------------------------------------------------------------

Comment By: Antoine Pitrou (pitrou)
Date: 2007-01-23 09:06

Message:
Logged In: YES 
user_id=133955
Originator: NO

Ok, since I was the one bitten by this bug I might as well add my 2 cents
to the discussion.

vsajip:
> 1. Use the 'extra' parameter (added in Python 2.5).
This is not practical. I want to define a prefix once and for all for all
log messages that will be output in a given context. Explicitly adding a
parameter to every log call does not help.
(of course I can write wrappers to do this automatically - and that's what
I ended up doing -, but then I must write 6 of them: one for each of
"debug", "info", "warning", "error", "critical", and "exception"...)

> 2. Use a connection-specific factory to obtain the logging message, or
wrap the logging call on a connection-specific object which inserts the
connection info.

I don't even know what this means, but it sounds way overkill...

> 3. Use something other than a literal string for the message - as
documented, any object can be used as the message, and the logging system
calls str() on it to get the actual text of the message. The "something"
can be an instance of a class which Does The Right Thing.

IIUC this means some explicitly machinery on each logging call, since I
have to wrap every string in a constructor. Just like the "extra"
parameter, with a slightly different flavour.

It's disturbing that the logging module has so many powerful options but
no way of conveniently doing simple things without creating memory
leaks...


----------------------------------------------------------------------

Comment By: TH (therve)
Date: 2007-01-23 00:54

Message:
Logged In: YES 
user_id=1038797
Originator: YES

OK I understand the design. But it's not clear in the documentation that
once you've called getLogger('id') the logger will live forever. It's
especially problematic on long-running processes.

It would be great to have at least a warning in the documentation about
this feature.

----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2007-01-23 00:42

Message:
Logged In: YES 
user_id=308438
Originator: NO

This is not a leak - it's by design. You are not using best practice when
you create a logger per client; the specific scenario of getting connection
info in the logging message can currently be done in several ways, e.g.

1. Use the 'extra' parameter (added in Python 2.5).
2. Use a connection-specific factory to obtain the logging message, or
wrap the logging call on a connection-specific object which inserts the
connection info.
3. Use something other than a literal string for the message - as
documented, any object can be used as the message, and the logging system
calls str() on it to get the actual text of the message. The "something"
can be an instance of a class which Does The Right Thing.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-22 20:47

Message:
Logged In: YES 
user_id=33168
Originator: NO

Vinay, can you provide some direction?  Thanks.

----------------------------------------------------------------------

Comment By: TH (therve)
Date: 2007-01-22 09:09

Message:
Logged In: YES 
user_id=1038797
Originator: YES

Looking at the documentation, it seems keeping it is mandatory because you
must get the same instance with getLogger. Maybe it'd need a documented way
to remove from the dict, though.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470

From noreply at sourceforge.net  Sun Jan 28 03:48:24 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sat, 27 Jan 2007 18:48:24 -0800
Subject: [Patches] [ python-Patches-1641544 ] rlcompleter tab completion in
	pdb
Message-ID: <E1HB05g-0003bX-OE@sc8-sf-web7.sourceforge.net>

Patches item #1641544, was opened at 2007-01-22 06:52
Message generated for change (Comment added) made by rockyb
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Stephen Emslie (stephenemslie)
Assigned to: Nobody/Anonymous (nobody)
Summary: rlcompleter tab completion in pdb

Initial Comment:
By default, Pdb and other instances of Cmd complete names for their commands. However in the context of pdb, I think it is more useful to complete identifiers and keywords in its current scope than to complete names of commands (most of which have single letter abbreviations). I believe this makes pdb a far more usable introspection tool.

I have discussed this proposal on the python-ideas list:
http://mail.python.org/pipermail/python-ideas/2007-January/000084.html

This patch implements the following:
  - creates an rlcompleter instance on Pdb if readline is available
  - adds a 'complete' method to the Pdb class. The only difference with rlcompleter's default behaviour is that is also updates rlcompleter's namespace to reflect the current local and global namespace, which is necessary because pdb changes scope as it steps through a program

This is a patch against python/Lib/pdb.py rev. 51745


----------------------------------------------------------------------

Comment By: Rocky Bernstein (rockyb)
Date: 2007-01-27 21:48

Message:
Logged In: YES 
user_id=158581
Originator: NO

I experimented with this a little in the pydb variant
(http://bashdb.sf.net/pydb). Some observations. First, one can include the
debugger commands into the namespace without too much trouble. See what's
checked into CVS for pydb; In particular look at the complete method of
pydbbdb. (Personally, I think adding debugger commands to the list of
completions is a little more honest.)

The second problem I have is that completion is not all that sensitive to
the preceding context.  If the line begins "step" or "1 + ", is it really
correct to list all valid symbols? 

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470

From noreply at sourceforge.net  Sun Jan 28 15:21:49 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Sun, 28 Jan 2007 06:21:49 -0800
Subject: [Patches] [ python-Patches-1646432 ] ConfigParser getboolean()
	consistency
Message-ID: <E1HBAuj-0008BP-Cq@sc8-sf-web9.sourceforge.net>

Patches item #1646432, was opened at 2007-01-28 16:21
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1646432&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Tal Einat (taleinat)
Assigned to: Nobody/Anonymous (nobody)
Summary: ConfigParser getboolean() consistency

Initial Comment:
Minor code change - made getboolean() implementation more consistent with other get...() methods. (i.e. uses _get)

(functionality unchanged)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1646432&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 09:41:08 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 00:41:08 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBS4a-0000eo-Rn@sc8-sf-web1.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 16:13
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 09:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 20:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 19:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 19:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 23:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 17:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 04:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 16:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 01:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 12:07:03 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 03:07:03 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBULn-0006vz-D3@sc8-sf-web9.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 15:13
Message generated for change (Comment added) made by gustavo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 11:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 08:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 19:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 18:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 17:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 22:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 16:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 03:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 14:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 00:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 13:37:14 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 04:37:14 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1HBVl4-0001sg-Du@sc8-sf-web2.sourceforge.net>

Patches item #1633807, was opened at 2007-01-12 18:13
Message generated for change (Comment added) made by anthonybaxter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Nobody/Anonymous (nobody)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-29 23:37

Message:
Logged In: YES 
user_id=29957
Originator: YES

Attached version 3 of the patch.

I've added an '#if 0'd warning in ast.c - for instance, when enabled, you
get
./setup.py:1336: SyntaxWarning: print no longer a statement in Py3.0
I'll make a new version of a -W py3k patch that enables this as well. 

I've made the other cleanup suggested by twouters. I'm not clear on the
best way to do the tests for this - the from __future__ needs to be at the
start of the file. My concern is that anything that tries to compile the
whole test file with this under a previous version will choke and die on
the print-as-function. Not sure if this is a hugely bad problem or not. 

Docs will follow once I bother wrapping my head around LaTeX and figuring
out the best way to do the docs. I'm guessing we need a note in
ref/ref6.tex in the section on the print statement, another bit in the same
file in the subsection on Future statements, and something in
lib/libbltin.tex. Did I miss anywhere?

In current 3.0, the builtin is called Print, not print. Is there a reason
for this? Is it just a matter of updating all the tests and ripping out the
support for the print statement and the related opcodes? If so, I'll tackle
that next. Doing this does mean that the docs and the stdlib and the tests
will all need a huge amount of updating, and it will make merging from the
trunk to the p3yk branch much more painful.

While I'm in the vague area - why is PRINT_ITEM inlined in ceval.c?
Couldn't it be punted out to a separate function, making the main switch
statement just that little bit smaller? I can't imagine that making 'print'
that little tiny bit faster is actually worthwhile, compared to shrinking
the main switch statement.

except E as V, I'll look at later for a different patch. My tree is
already getting quite cluttered already with uncommitted patches :-)
File Added: print_function.patch3

----------------------------------------------------------------------

Comment By: Thomas Wouters (twouters)
Date: 2007-01-18 02:58

Message:
Logged In: YES 
user_id=34209
Originator: NO

You seem to have '#if 0'ed-out some code related to the with/as-statement
warnings; I suggest just removing them. Since you're in this code now, it
might make sense to provide a commented out warning about the use of the
print statement, so we won't have to figure it out later (in Python 2.9 or
when we add -Wp3yk.)

It needs a test, and probably a doc change somewhere.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-18 02:24

Message:
Logged In: YES 
user_id=6380
Originator: NO

I don't think we need to do anything special for exec, as the exec(s,
locals, globals) syntax is already (still :-) supported in 2.x with
identical semantics as in 3.0.

except E as V *syntax* can go in without a future stmt; and (only when
that syntax is used) it should also enforce the new semantics (V must be a
simple name and is deleted at the end of the except clause).

I think Anthony's patch is a great idea, but I'll refrain from reviewing
it.  I'd say "just do it". :-)

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 18:42

Message:
Logged In: YES 
user_id=33168
Originator: NO

Guido, this is the patch I was talking about wrt supporting a print
function in 2.6.  exec could get similar treatment.  You mentioned in mail
that things like except E as V: can go in without a future stmt.  I agree.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-12 18:31

Message:
Logged In: YES 
user_id=29957
Originator: YES

Updated version of patch - fixes interactive mode, adds builtins.print

File Added: print_function.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 17:57:32 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 08:57:32 -0800
Subject: [Patches] [ python-Patches-1615158 ] POSIX capabilities support
Message-ID: <E1HBZoy-0004vl-L4@sc8-sf-monitor2.sourceforge.net>

Patches item #1615158, was opened at 2006-12-13 18:10
Message generated for change (Comment added) made by gj0aqzda
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Matt Kern (gj0aqzda)
Assigned to: Nobody/Anonymous (nobody)
Summary: POSIX capabilities support

Initial Comment:
Attached is a patch which adds POSIX capabilities support.  The following API functions are supported:

  * cap_clear
  * cap_copy_ext
  * cap_dup
  * cap_from_text
  * cap_get_flag
  * cap_get_proc
  * cap_init
  * cap_set_flag
  * cap_set_proc
  * cap_size
  * cap_to_text

The following API function is supported, but is broken with certain versions of libcap (I am running debian testing's libcap1, version 1.10-14, which has an issue;  I have reported this upstream):
  * cap_copy_int

The following API functions are in there as stubs, but currently are not compiled.  I need access to a machine to test these.  I will probably add autoconf tests for availability of these functions in due course:
  * cap_get_fd
  * cap_get_file
  * cap_set_fd
  * cap_set_file

The patch includes diffs to configure.  My autoconf is however at a different revision to that used on the python trunk.  You may want to re-autoconf configure.in.

I've added a few API tests to test_posix.py.


----------------------------------------------------------------------

>Comment By: Matt Kern (gj0aqzda)
Date: 2007-01-29 16:57

Message:
Logged In: YES 
user_id=1667774
Originator: YES

No news on these patches in a while.

To summarise, the patches are ready to go in.  The issues surrounding
cap_copy_int(), cap_get_*() and cap_set_*() are pretty minor.  The vast
majority of uses will be of the cap_get_proc(), cap_set_flag(),
cap_set_proc() variety.

I am not trying to hassle you; I know you don't have enough time to get
through everything.  However, I'll hang fire on future development of
stuff that I, personally, am not going to use, until I know when/if these
patches are going to go in.


----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2006-12-19 10:48

Message:
Logged In: YES 
user_id=1667774
Originator: YES

I've attached a documentation patch, which should be applied in addition
to the base patch.
File Added: patch-svn-doc.diff

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-12-16 13:25

Message:
Logged In: YES 
user_id=849994
Originator: NO

(If you don't want to write LaTeX, it's enough to write the docs in
plaintext, there are a few volunteers who will convert it appropriately.)

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-16 12:28

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please provide documentation changes as well?

----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2006-12-13 18:12

Message:
Logged In: YES 
user_id=1667774
Originator: YES

I should further add that I have implemented the following API calls as
methods of the new CapabilityState object in addition to the standard
functions:

  * cap_clear
  * cap_copy_ext
  * cap_dup
  * cap_get_flag
  * cap_set_flag
  * cap_set_proc
  * cap_size
  * cap_to_text


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 18:07:29 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 09:07:29 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1HBZyb-0002Fd-FG@sc8-sf-web5.sourceforge.net>

Patches item #1633807, was opened at 2007-01-12 02:13
Message generated for change (Comment added) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Nobody/Anonymous (nobody)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

>Comment By: Raymond Hettinger (rhettinger)
Date: 2007-01-29 12:07

Message:
Logged In: YES 
user_id=80475
Originator: NO

Instead of __future__ imports, it would be better to put all of this Py3.0
stuff in a single compatability module and keep the rest of Py2.x as clean
as possible.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-29 07:37

Message:
Logged In: YES 
user_id=29957
Originator: YES

Attached version 3 of the patch.

I've added an '#if 0'd warning in ast.c - for instance, when enabled, you
get
./setup.py:1336: SyntaxWarning: print no longer a statement in Py3.0
I'll make a new version of a -W py3k patch that enables this as well. 

I've made the other cleanup suggested by twouters. I'm not clear on the
best way to do the tests for this - the from __future__ needs to be at the
start of the file. My concern is that anything that tries to compile the
whole test file with this under a previous version will choke and die on
the print-as-function. Not sure if this is a hugely bad problem or not. 

Docs will follow once I bother wrapping my head around LaTeX and figuring
out the best way to do the docs. I'm guessing we need a note in
ref/ref6.tex in the section on the print statement, another bit in the same
file in the subsection on Future statements, and something in
lib/libbltin.tex. Did I miss anywhere?

In current 3.0, the builtin is called Print, not print. Is there a reason
for this? Is it just a matter of updating all the tests and ripping out the
support for the print statement and the related opcodes? If so, I'll tackle
that next. Doing this does mean that the docs and the stdlib and the tests
will all need a huge amount of updating, and it will make merging from the
trunk to the p3yk branch much more painful.

While I'm in the vague area - why is PRINT_ITEM inlined in ceval.c?
Couldn't it be punted out to a separate function, making the main switch
statement just that little bit smaller? I can't imagine that making 'print'
that little tiny bit faster is actually worthwhile, compared to shrinking
the main switch statement.

except E as V, I'll look at later for a different patch. My tree is
already getting quite cluttered already with uncommitted patches :-)
File Added: print_function.patch3

----------------------------------------------------------------------

Comment By: Thomas Wouters (twouters)
Date: 2007-01-17 10:58

Message:
Logged In: YES 
user_id=34209
Originator: NO

You seem to have '#if 0'ed-out some code related to the with/as-statement
warnings; I suggest just removing them. Since you're in this code now, it
might make sense to provide a commented out warning about the use of the
print statement, so we won't have to figure it out later (in Python 2.9 or
when we add -Wp3yk.)

It needs a test, and probably a doc change somewhere.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-17 10:24

Message:
Logged In: YES 
user_id=6380
Originator: NO

I don't think we need to do anything special for exec, as the exec(s,
locals, globals) syntax is already (still :-) supported in 2.x with
identical semantics as in 3.0.

except E as V *syntax* can go in without a future stmt; and (only when
that syntax is used) it should also enforce the new semantics (V must be a
simple name and is deleted at the end of the except clause).

I think Anthony's patch is a great idea, but I'll refrain from reviewing
it.  I'd say "just do it". :-)

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 02:42

Message:
Logged In: YES 
user_id=33168
Originator: NO

Guido, this is the patch I was talking about wrt supporting a print
function in 2.6.  exec could get similar treatment.  You mentioned in mail
that things like except E as V: can go in without a future stmt.  I agree.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-12 02:31

Message:
Logged In: YES 
user_id=29957
Originator: YES

Updated version of patch - fixes interactive mode, adds builtins.print

File Added: print_function.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 19:36:40 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 10:36:40 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBbMu-0001RY-VY@sc8-sf-web6.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 16:13
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 19:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 12:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 09:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 20:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 19:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 19:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 23:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 17:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 04:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 16:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 01:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 19:53:08 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 10:53:08 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBbcq-0004hT-7m@sc8-sf-web10.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 15:13
Message generated for change (Comment added) made by gustavo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 18:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 18:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 11:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 08:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 19:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 18:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 17:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 22:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 16:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 03:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 14:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 00:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 20:30:04 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 11:30:04 -0800
Subject: [Patches] [ python-Patches-1615158 ] POSIX capabilities support
Message-ID: <E1HBcCa-0005q2-Me@sc8-sf-web11.sourceforge.net>

Patches item #1615158, was opened at 2006-12-13 19:10
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Matt Kern (gj0aqzda)
Assigned to: Nobody/Anonymous (nobody)
Summary: POSIX capabilities support

Initial Comment:
Attached is a patch which adds POSIX capabilities support.  The following API functions are supported:

  * cap_clear
  * cap_copy_ext
  * cap_dup
  * cap_from_text
  * cap_get_flag
  * cap_get_proc
  * cap_init
  * cap_set_flag
  * cap_set_proc
  * cap_size
  * cap_to_text

The following API function is supported, but is broken with certain versions of libcap (I am running debian testing's libcap1, version 1.10-14, which has an issue;  I have reported this upstream):
  * cap_copy_int

The following API functions are in there as stubs, but currently are not compiled.  I need access to a machine to test these.  I will probably add autoconf tests for availability of these functions in due course:
  * cap_get_fd
  * cap_get_file
  * cap_set_fd
  * cap_set_file

The patch includes diffs to configure.  My autoconf is however at a different revision to that used on the python trunk.  You may want to re-autoconf configure.in.

I've added a few API tests to test_posix.py.


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 20:30

Message:
Logged In: YES 
user_id=21627
Originator: NO

The patch cannot go in in its current form (I started applying it, but
then found that I just can't do it). It contains conditional, commented out
code. Either the code is correct, then it should be added, or it is
incorrect, in which case it should be removed entirely. There shouldn't be
any work-in-progress code in the Python repository whatsoever. This refers
to both the if 0 blocks (which I thought I can safely delete), as well as
commented-out entries in CapabilityStateMethods (for which I didn't know
what to do).

So while you are revising it, I have a few remarks:
- you can safely omit the generated configure changes from the patch - I
will regenerate them, anyway.
- please follow the alphabet in the header files in configure.in (bsdtty.h
< capabilities.h)
- please don't expose method on objects on which they aren't methods. E.g.
cap_clear is available both as a method and a module-level function; that
can't be both right (there should be one way to do it)
  Following the socket API, I think offering these as methods is
reasonable
- try avoiding the extra copy in copy_ext (copying directly into the
string). If you keep malloc calls, don't return NULL without setting a
Python exception.
- use the "s" format for copy_int and from_text
- consider using booleans for [gs]et_flags

----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2007-01-29 17:57

Message:
Logged In: YES 
user_id=1667774
Originator: YES

No news on these patches in a while.

To summarise, the patches are ready to go in.  The issues surrounding
cap_copy_int(), cap_get_*() and cap_set_*() are pretty minor.  The vast
majority of uses will be of the cap_get_proc(), cap_set_flag(),
cap_set_proc() variety.

I am not trying to hassle you; I know you don't have enough time to get
through everything.  However, I'll hang fire on future development of stuff
that I, personally, am not going to use, until I know when/if these patches
are going to go in.


----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2006-12-19 11:48

Message:
Logged In: YES 
user_id=1667774
Originator: YES

I've attached a documentation patch, which should be applied in addition
to the base patch.
File Added: patch-svn-doc.diff

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-12-16 14:25

Message:
Logged In: YES 
user_id=849994
Originator: NO

(If you don't want to write LaTeX, it's enough to write the docs in
plaintext, there are a few volunteers who will convert it appropriately.)

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-16 13:28

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please provide documentation changes as well?

----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2006-12-13 19:12

Message:
Logged In: YES 
user_id=1667774
Originator: YES

I should further add that I have implemented the following API calls as
methods of the new CapabilityState object in addition to the standard
functions:

  * cap_clear
  * cap_copy_ext
  * cap_dup
  * cap_get_flag
  * cap_set_flag
  * cap_set_proc
  * cap_size
  * cap_to_text


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 21:01:27 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 12:01:27 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBcgx-0002Ew-JF@sc8-sf-web5.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 16:13
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 21:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals through
sa_mask. Then, no two signal handlers will get invoked simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 19:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 19:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 12:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 09:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 20:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 19:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 19:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 23:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 17:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 04:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 16:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 01:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 23:11:16 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 14:11:16 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBeia-0001xK-GI@sc8-sf-monitor2.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 08:13
Message generated for change (Comment added) made by rhamphoryncus
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 15:11

Message:
Logged In: YES 
user_id=12364
Originator: NO

As far as I can tell, the sig_mask argument of sigaction only applies to
the thread in which the signal handler gets called.  If you have multiple
threads you could still have one signal handler running per thread.

http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 13:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals
through sa_mask. Then, no two signal handlers will get invoked
simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 11:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 11:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 04:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return
EINTR for this platform or not (can we guarantee it always happens)? 
Which one is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 01:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a
gtk thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism
is needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 12:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 11:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 11:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 10:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 15:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 09:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 20:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 08:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 07:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-26 17:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 23:25:21 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 14:25:21 -0800
Subject: [Patches] [ python-Patches-1647484 ] gzip.GzipFile has no name
	attribute
Message-ID: <E1HBewD-0005vX-FA@sc8-sf-web9.sourceforge.net>

Patches item #1647484, was opened at 2007-01-29 23:25
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1647484&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Lars Gust?bel (gustaebel)
Assigned to: Nobody/Anonymous (nobody)
Summary: gzip.GzipFile has no name attribute

Initial Comment:
The gzip.GzipFile object uses a filename instead of a name attribute. This deviates from the standard practice and the interface described in "3.9 File Objects" and seems unnecessary.
Attached patch changes this but still leaves the filename attribute as a property that emits a DeprecationWarning.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1647484&group_id=5470

From noreply at sourceforge.net  Mon Jan 29 23:25:42 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 14:25:42 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBewY-00065u-3p@sc8-sf-web7.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 16:13
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 23:25

Message:
Logged In: YES 
user_id=21627
Originator: NO

Right. To prevent the simultaneous invocation of Py_AddPendingCall from
multiple threads, two alternatives are possible:
a) protect the routine with a thread mutex, if threading is available
b) use pthread_kill in threads other than the main thread (as I proposed
earlier); those other threads then wouldn't call Py_AddPendingCall anymore

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 23:11

Message:
Logged In: YES 
user_id=12364
Originator: NO

As far as I can tell, the sig_mask argument of sigaction only applies to
the thread in which the signal handler gets called.  If you have multiple
threads you could still have one signal handler running per thread.

http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 21:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals
through sa_mask. Then, no two signal handlers will get invoked
simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 19:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 19:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 12:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return
EINTR for this platform or not (can we guarantee it always happens)? 
Which one is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 09:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a
gtk thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism
is needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 20:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 19:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 19:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 23:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 17:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 04:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 16:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 01:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 00:59:39 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 15:59:39 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBgPT-0007Mm-3k@sc8-sf-web6.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 08:13
Message generated for change (Comment added) made by rhamphoryncus
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 16:59

Message:
Logged In: YES 
user_id=12364
Originator: NO

Unfortunately, neither the mutex functions nor pthread_kill() are listed
as async-signal-safe:
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

Personally, I'd be just as happy to raise an exception if an attempt is
made to import both signal and threading: doing it safely and reliably is
just too difficult, so we shouldn't promote a false sense of security.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 15:25

Message:
Logged In: YES 
user_id=21627
Originator: NO

Right. To prevent the simultaneous invocation of Py_AddPendingCall from
multiple threads, two alternatives are possible:
a) protect the routine with a thread mutex, if threading is available
b) use pthread_kill in threads other than the main thread (as I proposed
earlier); those other threads then wouldn't call Py_AddPendingCall anymore

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 15:11

Message:
Logged In: YES 
user_id=12364
Originator: NO

As far as I can tell, the sig_mask argument of sigaction only applies to
the thread in which the signal handler gets called.  If you have multiple
threads you could still have one signal handler running per thread.

http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 13:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals through
sa_mask. Then, no two signal handlers will get invoked simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 11:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 11:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 04:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 01:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 12:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 11:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 11:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 10:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 15:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 09:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 20:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 08:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 07:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-26 17:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 01:52:20 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 16:52:20 -0800
Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module
Message-ID: <E1HBhES-0003WJ-0X@sc8-sf-web11.sourceforge.net>

Patches item #1638033, was opened at 2007-01-17 20:07
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Arvin Schnell (arvins)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add httponly to Cookie module

Initial Comment:
Add the Microsoft extension httponly to the
Cookie module.


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-30 00:52

Message:
Logged In: YES 
user_id=261020
Originator: NO

This is backwards-incompatible, no?  The behaviour of Morsel.set() changes
(disallowing key="httponly") hence the behaviour of BaseCookie.__setitem__
changes.

Do you have a use case?


----------------------------------------------------------------------

Comment By: Arvin Schnell (arvins)
Date: 2007-01-19 17:01

Message:
Logged In: YES 
user_id=698939
Originator: YES

Sure, I have added some documentation to the patch.

File Added: python.diff

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-19 15:06

Message:
Logged In: YES 
user_id=764593
Originator: NO

The documentation change should say what the attribute does.  (It requests
the the cookie be hidden from javascript, and available only to http
requests.)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 02:34:40 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 17:34:40 -0800
Subject: [Patches] [ python-Patches-1508475 ] transparent gzip compression
	in liburl2
Message-ID: <E1HBhtQ-0003u7-VC@sc8-sf-web4.sourceforge.net>

Patches item #1508475, was opened at 2006-06-19 09:59
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1508475&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Jakob Truelsen (antialize)
Assigned to: Nobody/Anonymous (nobody)
Summary: transparent gzip compression in liburl2

Initial Comment:
Some webservers support gzipping things before sending
them, this patch adds transparrent support for this in
urllib2 (documentation http://www.http-compression.com/)

This patach *requires* hash patch 914340 as a
prerequirement as this enabels stream support in the
gzip libary.. 

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-30 01:34

Message:
Logged In: YES 
user_id=261020
Originator: NO

Looks good.

This needs tests and docs.  As a new feature, this could not be released
until Python 2.6.

It would be nice to have support for managing content negotiation in
general, but that wish isn't an obstacle to this patch.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1508475&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 03:15:41 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 18:15:41 -0800
Subject: [Patches] [ python-Patches-1550272 ] Add a test suite for unittest
Message-ID: <E1HBiX7-0001YF-Hr@sc8-sf-web3.sourceforge.net>

Patches item #1550272, was opened at 2006-09-01 03:44
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1550272&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tests
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add a test suite for unittest

Initial Comment:
This file replaces the current version
Lib/test/test_unittest.py, which only contains a single
test. The attached suite contains 128 tests for the
mission-critical parts of unittest.

A patch will follow shortly that fixes the bugs in
unittest uncovered by this test suite.

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-30 02:15

Message:
Logged In: YES 
user_id=261020
Originator: NO

Oh the irony. :)

This is good stuff.  I have not reviewed the whole patch, but sampling
bits of it it looks fine.  No great danger in committing this, so why not
let's commit it?

Of the following points, I think only the first should block commit of
this patch.  Any comments on that first point?

1. test_loadTestsFromName__module_not_loaded() and
test_loadTestsFromNames__module_not_loaded() -- these may break in future,
and may break e.g. only when running tests in random order, which is 
sometimes done when debugging obscure stuff.  Better to introduce a module
of your own in Lib/test that's guaranteed not to be loaded already -- maybe
test_unittest_fodder.py .  Still, that wouldn't help the case where
somebody is running the tests in a loop, which would cause failures already
(again, this is something people do as part of bug detection / removal).  I
don't know the import internals and I hear they're messy, but perhaps just
del sys.modules[module_name] at the start of each of those two methods is
at least an improvement over what they do now.

2. Would be helpful to list what remains to be tested (for example, there
is no test of assertRaises)

3. Why no use of .assertRaises?

4. Would be nice to resolve some of the XXXes, but I realise that this may
be difficult/impossible given the requirement for backwards-compatibility


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2006-09-01 03:52

Message:
Logged In: YES 
user_id=1344176

That promised patch for unittest is #1550273.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1550272&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 03:32:04 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 18:32:04 -0800
Subject: [Patches] [ python-Patches-1486713 ] HTMLParser : A auto-tolerant
	parsing mode
Message-ID: <E1HBimy-0001OS-Bb@sc8-sf-web4.sourceforge.net>

Patches item #1486713, was opened at 2006-05-11 18:19
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1486713&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: kxroberto (kxroberto)
Assigned to: Nobody/Anonymous (nobody)
Summary: HTMLParser : A auto-tolerant parsing mode

Initial Comment:
Changes:

* Now allows missing spaces between attributes as its
often seen on the web like this :

<script type="text/javascript"language="JavaScript1.1">

That like broke the whole parsing before.


* A fully auto-tolerant mode (HTMLParser.tolerant=1)
was added. It should hopefully NEVER break HTML parsing
on the level of HTMLParser, but recover and continue
the parsing smartly. The mode was tested extensively
with complex pages. The tolerant mode is guaranted to
finish all HTML stuff only during HTMLParser.close() /
goahead(end=True)  - yet that was the same (stucking)
policy before.
Maybe steep: I have  switched ON the tolerant mode by
default, as this is, what in 99.9% of cases one wants
to have.
(I've maybe 20 applications for HTMLParser - None like
the unrecoverable breaks with Exceptions)
During tolerant mode the virtual .warning(message,i,k)
is called instead of error - by default this just
counts .warning_count up. This framework should even
enable to write po HTML checkers

* The patch was generated against py2.3 (still the
"good/base" Python for me) and also fixes a regexp-bug
(which already was fixed in py2.4.2). Yet the patch
works also against py2.4/2.5 - 2 locations where py24
trivially changed to %r/repr may grumble.


-robert


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-30 02:32

Message:
Logged In: YES 
user_id=261020
Originator: NO

This badly needs unit tests.


----------------------------------------------------------------------

Comment By: kxroberto (kxroberto)
Date: 2006-05-23 16:15

Message:
Logged In: YES 
user_id=972995

(and works also for Python2.5)

----------------------------------------------------------------------

Comment By: kxroberto (kxroberto)
Date: 2006-05-23 16:11

Message:
Logged In: YES 
user_id=972995

Python 2.4 version of the patch added.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1486713&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 05:52:13 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 20:52:13 -0800
Subject: [Patches] [ python-Patches-1615158 ] POSIX capabilities support
Message-ID: <E1HBkyb-0001Td-0B@sc8-sf-web11.sourceforge.net>

Patches item #1615158, was opened at 2006-12-13 10:10
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Matt Kern (gj0aqzda)
Assigned to: Nobody/Anonymous (nobody)
Summary: POSIX capabilities support

Initial Comment:
Attached is a patch which adds POSIX capabilities support.  The following API functions are supported:

  * cap_clear
  * cap_copy_ext
  * cap_dup
  * cap_from_text
  * cap_get_flag
  * cap_get_proc
  * cap_init
  * cap_set_flag
  * cap_set_proc
  * cap_size
  * cap_to_text

The following API function is supported, but is broken with certain versions of libcap (I am running debian testing's libcap1, version 1.10-14, which has an issue;  I have reported this upstream):
  * cap_copy_int

The following API functions are in there as stubs, but currently are not compiled.  I need access to a machine to test these.  I will probably add autoconf tests for availability of these functions in due course:
  * cap_get_fd
  * cap_get_file
  * cap_set_fd
  * cap_set_file

The patch includes diffs to configure.  My autoconf is however at a different revision to that used on the python trunk.  You may want to re-autoconf configure.in.

I've added a few API tests to test_posix.py.


----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-29 20:52

Message:
Logged In: YES 
user_id=33168
Originator: NO

ISTM that this would be better as a separate module or an optional
submodule to posix.  The posix module is already 8720 lines.  I really
don't want it to get bigger, especially when you realize how much
#ifdef'ery is in there.  

Some other things I noticed:

You should use PyMem_Malloc instead of a raw malloc (same deal with free).
 Methods that take no arguments should use METH_NOARGS and then there's no
need to call PyArgs_ParseTuple (e.g., posix_cap_get_proc).  

There definitely shouldn't be any abort()s in there, even if #ifdef'ed
out.

Is this 64-bit safe?  My manpage (gentoo) says this:  int 
cap_set_flag(cap_t  cap_p,  cap_flag_t flag, int ncap, cap_value_t *caps,
cap_flag_value_t value);

I see that you are using ints.  I don't know if that's correct on a 64-bit
platform.  If not, you will need to modify the places that ints are used to
take longs.


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 11:30

Message:
Logged In: YES 
user_id=21627
Originator: NO

The patch cannot go in in its current form (I started applying it, but
then found that I just can't do it). It contains conditional, commented out
code. Either the code is correct, then it should be added, or it is
incorrect, in which case it should be removed entirely. There shouldn't be
any work-in-progress code in the Python repository whatsoever. This refers
to both the if 0 blocks (which I thought I can safely delete), as well as
commented-out entries in CapabilityStateMethods (for which I didn't know
what to do).

So while you are revising it, I have a few remarks:
- you can safely omit the generated configure changes from the patch - I
will regenerate them, anyway.
- please follow the alphabet in the header files in configure.in (bsdtty.h
< capabilities.h)
- please don't expose method on objects on which they aren't methods. E.g.
cap_clear is available both as a method and a module-level function; that
can't be both right (there should be one way to do it)
  Following the socket API, I think offering these as methods is
reasonable
- try avoiding the extra copy in copy_ext (copying directly into the
string). If you keep malloc calls, don't return NULL without setting a
Python exception.
- use the "s" format for copy_int and from_text
- consider using booleans for [gs]et_flags

----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2007-01-29 08:57

Message:
Logged In: YES 
user_id=1667774
Originator: YES

No news on these patches in a while.

To summarise, the patches are ready to go in.  The issues surrounding
cap_copy_int(), cap_get_*() and cap_set_*() are pretty minor.  The vast
majority of uses will be of the cap_get_proc(), cap_set_flag(),
cap_set_proc() variety.

I am not trying to hassle you; I know you don't have enough time to get
through everything.  However, I'll hang fire on future development of stuff
that I, personally, am not going to use, until I know when/if these patches
are going to go in.


----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2006-12-19 02:48

Message:
Logged In: YES 
user_id=1667774
Originator: YES

I've attached a documentation patch, which should be applied in addition
to the base patch.
File Added: patch-svn-doc.diff

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-12-16 05:25

Message:
Logged In: YES 
user_id=849994
Originator: NO

(If you don't want to write LaTeX, it's enough to write the docs in
plaintext, there are a few volunteers who will convert it appropriately.)

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2006-12-16 04:28

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please provide documentation changes as well?

----------------------------------------------------------------------

Comment By: Matt Kern (gj0aqzda)
Date: 2006-12-13 10:12

Message:
Logged In: YES 
user_id=1667774
Originator: YES

I should further add that I have implemented the following API calls as
methods of the new CapabilityState object in addition to the standard
functions:

  * cap_clear
  * cap_copy_ext
  * cap_dup
  * cap_get_flag
  * cap_set_flag
  * cap_set_proc
  * cap_size
  * cap_to_text


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 07:19:42 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 22:19:42 -0800
Subject: [Patches] [ python-Patches-1633807 ] from __future__ import
	print_function
Message-ID: <E1HBmLG-0007UT-NA@sc8-sf-web4.sourceforge.net>

Patches item #1633807, was opened at 2007-01-12 18:13
Message generated for change (Comment added) made by anthonybaxter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Baxter (anthonybaxter)
Assigned to: Nobody/Anonymous (nobody)
Summary: from __future__ import print_function

Initial Comment:
This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!)

The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0

Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. 

----------------------------------------------------------------------

>Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-30 17:19

Message:
Logged In: YES 
user_id=29957
Originator: YES

That only works for a very small number of the changes. I can't see how we
could change the parser using a module.

from __future__ import foo is the standard way we make forwards-compatible
changes to Python. It's been that way for quite a while now - I don't see
it as being controversial at all.


----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2007-01-30 04:07

Message:
Logged In: YES 
user_id=80475
Originator: NO

Instead of __future__ imports, it would be better to put all of this Py3.0
stuff in a single compatability module and keep the rest of Py2.x as clean
as possible.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-29 23:37

Message:
Logged In: YES 
user_id=29957
Originator: YES

Attached version 3 of the patch.

I've added an '#if 0'd warning in ast.c - for instance, when enabled, you
get
./setup.py:1336: SyntaxWarning: print no longer a statement in Py3.0
I'll make a new version of a -W py3k patch that enables this as well. 

I've made the other cleanup suggested by twouters. I'm not clear on the
best way to do the tests for this - the from __future__ needs to be at the
start of the file. My concern is that anything that tries to compile the
whole test file with this under a previous version will choke and die on
the print-as-function. Not sure if this is a hugely bad problem or not. 

Docs will follow once I bother wrapping my head around LaTeX and figuring
out the best way to do the docs. I'm guessing we need a note in
ref/ref6.tex in the section on the print statement, another bit in the same
file in the subsection on Future statements, and something in
lib/libbltin.tex. Did I miss anywhere?

In current 3.0, the builtin is called Print, not print. Is there a reason
for this? Is it just a matter of updating all the tests and ripping out the
support for the print statement and the related opcodes? If so, I'll tackle
that next. Doing this does mean that the docs and the stdlib and the tests
will all need a huge amount of updating, and it will make merging from the
trunk to the p3yk branch much more painful.

While I'm in the vague area - why is PRINT_ITEM inlined in ceval.c?
Couldn't it be punted out to a separate function, making the main switch
statement just that little bit smaller? I can't imagine that making 'print'
that little tiny bit faster is actually worthwhile, compared to shrinking
the main switch statement.

except E as V, I'll look at later for a different patch. My tree is
already getting quite cluttered already with uncommitted patches :-)
File Added: print_function.patch3

----------------------------------------------------------------------

Comment By: Thomas Wouters (twouters)
Date: 2007-01-18 02:58

Message:
Logged In: YES 
user_id=34209
Originator: NO

You seem to have '#if 0'ed-out some code related to the with/as-statement
warnings; I suggest just removing them. Since you're in this code now, it
might make sense to provide a commented out warning about the use of the
print statement, so we won't have to figure it out later (in Python 2.9 or
when we add -Wp3yk.)

It needs a test, and probably a doc change somewhere.


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2007-01-18 02:24

Message:
Logged In: YES 
user_id=6380
Originator: NO

I don't think we need to do anything special for exec, as the exec(s,
locals, globals) syntax is already (still :-) supported in 2.x with
identical semantics as in 3.0.

except E as V *syntax* can go in without a future stmt; and (only when
that syntax is used) it should also enforce the new semantics (V must be a
simple name and is deleted at the end of the except clause).

I think Anthony's patch is a great idea, but I'll refrain from reviewing
it.  I'd say "just do it". :-)

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-01-17 18:42

Message:
Logged In: YES 
user_id=33168
Originator: NO

Guido, this is the patch I was talking about wrt supporting a print
function in 2.6.  exec could get similar treatment.  You mentioned in mail
that things like except E as V: can go in without a future stmt.  I agree.

----------------------------------------------------------------------

Comment By: Anthony Baxter (anthonybaxter)
Date: 2007-01-12 18:31

Message:
Logged In: YES 
user_id=29957
Originator: YES

Updated version of patch - fixes interactive mode, adds builtins.print

File Added: print_function.patch

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 07:30:57 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Mon, 29 Jan 2007 22:30:57 -0800
Subject: [Patches] [ python-Patches-1550272 ] Add a test suite for unittest
Message-ID: <E1HBmW9-0002zs-UJ@sc8-sf-web6.sourceforge.net>

Patches item #1550272, was opened at 2006-08-31 22:44
Message generated for change (Comment added) made by collinwinter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1550272&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Tests
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Collin Winter (collinwinter)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add a test suite for unittest

Initial Comment:
This file replaces the current version
Lib/test/test_unittest.py, which only contains a single
test. The attached suite contains 128 tests for the
mission-critical parts of unittest.

A patch will follow shortly that fixes the bugs in
unittest uncovered by this test suite.

----------------------------------------------------------------------

>Comment By: Collin Winter (collinwinter)
Date: 2007-01-30 01:30

Message:
Logged In: YES 
user_id=1344176
Originator: YES

Point 1: I vaguely remember having a reason for not doing 'del
sys.modules[module_name]' back when I wrote the suite, but I can't think of
what it is now. I've amended the suite to use that line.

Point 2: I've added a list at the top of things that have yet to be
tested. The TestCase.assert* and TestCase.fail* methods still need explicit
tests, though most are tested implicitly in the test suite itself.

Point 3: I think assertRaises is ugly : )

Point 4: I've given up on resolving a number of the XXXs in 2.x, though
I'm working on a rewrite of unittest for 3.x that will resolve all such
issues.

Thanks!
File Added: test_unittest.py

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-29 21:15

Message:
Logged In: YES 
user_id=261020
Originator: NO

Oh the irony. :)

This is good stuff.  I have not reviewed the whole patch, but sampling
bits of it it looks fine.  No great danger in committing this, so why not
let's commit it?

Of the following points, I think only the first should block commit of
this patch.  Any comments on that first point?

1. test_loadTestsFromName__module_not_loaded() and
test_loadTestsFromNames__module_not_loaded() -- these may break in future,
and may break e.g. only when running tests in random order, which is 
sometimes done when debugging obscure stuff.  Better to introduce a module
of your own in Lib/test that's guaranteed not to be loaded already -- maybe
test_unittest_fodder.py .  Still, that wouldn't help the case where
somebody is running the tests in a loop, which would cause failures already
(again, this is something people do as part of bug detection / removal).  I
don't know the import internals and I hear they're messy, but perhaps just
del sys.modules[module_name] at the start of each of those two methods is
at least an improvement over what they do now.

2. Would be helpful to list what remains to be tested (for example, there
is no test of assertRaises)

3. Why no use of .assertRaises?

4. Would be nice to resolve some of the XXXes, but I realise that this may
be difficult/impossible given the requirement for backwards-compatibility


----------------------------------------------------------------------

Comment By: Collin Winter (collinwinter)
Date: 2006-08-31 22:52

Message:
Logged In: YES 
user_id=1344176

That promised patch for unittest is #1550273.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1550272&group_id=5470

From yancy3 at cox.net  Tue Jan 30 09:20:58 2007
From: yancy3 at cox.net (Yancy O)
Date: Tue, 30 Jan 2007 00:20:58 -0800
Subject: [Patches]  Se'xy and very yo.ung models! blob
Message-ID: <45BEFFEA.5020707@cox.net>


From noreply at sourceforge.net  Tue Jan 30 11:48:19 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 02:48:19 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HBqXD-0000u9-Rm@sc8-sf-web4.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 15:13
Message generated for change (Comment added) made by gustavo
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-30 10:48

Message:
Logged In: YES 
user_id=908
Originator: YES

Adam, forbidding signals + threads is not possible.  signalmodule is
_always_ imported by python; it is builtin.  That's because it is signal
module that provides support for raising KeyboardInterrupt when Ctrl-C is
pressed.  Therefore. "signals + threads == forbidden" simplifies to
"threads == forbidden", and if threads are forbidden then let's just say
Python loses a lot of support :)

Regarding async safety, I think that generally any system call that may
potentially suspend a process, such as blocking system calls and, most
notably, mutexes, are not safe to use in async handlers.  Writing to a pipe
is safe only because the file descriptor is in non-blocking mode.  OTOH, in
non-blocking mode the file descriptor can only accept a few bytes at a
time, hence the (documented) limitation of not accepting large bursts of
signals.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 23:59

Message:
Logged In: YES 
user_id=12364
Originator: NO

Unfortunately, neither the mutex functions nor pthread_kill() are listed
as async-signal-safe:
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

Personally, I'd be just as happy to raise an exception if an attempt is
made to import both signal and threading: doing it safely and reliably is
just too difficult, so we shouldn't promote a false sense of security.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 22:25

Message:
Logged In: YES 
user_id=21627
Originator: NO

Right. To prevent the simultaneous invocation of Py_AddPendingCall from
multiple threads, two alternatives are possible:
a) protect the routine with a thread mutex, if threading is available
b) use pthread_kill in threads other than the main thread (as I proposed
earlier); those other threads then wouldn't call Py_AddPendingCall anymore

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 22:11

Message:
Logged In: YES 
user_id=12364
Originator: NO

As far as I can tell, the sig_mask argument of sigaction only applies to
the thread in which the signal handler gets called.  If you have multiple
threads you could still have one signal handler running per thread.

http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 20:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals through
sa_mask. Then, no two signal handlers will get invoked simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 18:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 18:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 11:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 08:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 19:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 18:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 17:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 22:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 16:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 03:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 14:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 00:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 19:14:06 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 10:14:06 -0800
Subject: [Patches] [ python-Patches-1648102 ] proxy_bypass in urllib
	handling of <local> macro
Message-ID: <E1HBxUc-0001zO-Bd@sc8-sf-web1.sourceforge.net>

Patches item #1648102, was opened at 2007-01-30 11:14
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648102&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Anthony Tuininga (atuining)
Assigned to: Nobody/Anonymous (nobody)
Summary: proxy_bypass in urllib handling of <local> macro

Initial Comment:
We have discovered that the handling of the <local> macro in urllib.proxy_bypass is broken. According to the Microsoft documentation for this macro, what should be checked is simply that the host name specified does not contain a period. Since urllib gets its proxy information directly from the Windows registry it would make sense to use the same definitions that Microsoft does. Attached is a patch that does this.

Here is a link to the documentation that specifies this:

http://msdn2.microsoft.com/en-gb/library/aa384098.aspx

Note the section discussing the proxy bypass parameter.

In addition, this patch adds code to strip the entries in the proxy bypass settings of any leading and trailing spaces. Internet Explorer supports this although it does not appear to be officially documented.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648102&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 19:45:13 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 10:45:13 -0800
Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module
Message-ID: <E1HBxyj-0006gV-IE@sc8-sf-web11.sourceforge.net>

Patches item #1638033, was opened at 2007-01-17 21:07
Message generated for change (Comment added) made by arvins
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Arvin Schnell (arvins)
Assigned to: Nobody/Anonymous (nobody)
Summary: Add httponly to Cookie module

Initial Comment:
Add the Microsoft extension httponly to the
Cookie module.


----------------------------------------------------------------------

>Comment By: Arvin Schnell (arvins)
Date: 2007-01-30 19:45

Message:
Logged In: YES 
user_id=698939
Originator: YES

Anybody who sets a cookie with key="httponly" is likely in trouble.  I
don't
know and can't check how the IE behaves in that case.  But disallowing
this use
shouldn't hurt.

Use case: I would like to use the httponly attribute in Django.  I think
it's
also useful for other web-frameworks.


----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-30 01:52

Message:
Logged In: YES 
user_id=261020
Originator: NO

This is backwards-incompatible, no?  The behaviour of Morsel.set() changes
(disallowing key="httponly") hence the behaviour of BaseCookie.__setitem__
changes.

Do you have a use case?


----------------------------------------------------------------------

Comment By: Arvin Schnell (arvins)
Date: 2007-01-19 18:01

Message:
Logged In: YES 
user_id=698939
Originator: YES

Sure, I have added some documentation to the patch.

File Added: python.diff

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2007-01-19 16:06

Message:
Logged In: YES 
user_id=764593
Originator: NO

The documentation change should say what the attribute does.  (It requests
the the cookie be hidden from javascript, and available only to http
requests.)

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 22:41:45 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 13:41:45 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HC0jZ-0005TL-Ln@sc8-sf-web1.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 08:13
Message generated for change (Comment added) made by rhamphoryncus
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-30 14:41

Message:
Logged In: YES 
user_id=12364
Originator: NO

Ouch, I forgot all about Ctrl-C.  It would have been an amusing if
short-lived "solution". ;)

The safety issue is not about suspending the process, which is safe albeit
not very useful.  The issue is libc (or even the kernel?) having internal
data structures in an inconsistent state and being unable to handle
reentrant or threaded access; random corruption, crashes, hangs, or any
sort of "undefined" behaviour can result if this is not heeded.  write() is
one of the precious few libc functions that is labeled as not vulnerable to
such things.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-30 03:48

Message:
Logged In: YES 
user_id=908
Originator: YES

Adam, forbidding signals + threads is not possible.  signalmodule is
_always_ imported by python; it is builtin.  That's because it is signal
module that provides support for raising KeyboardInterrupt when Ctrl-C is
pressed.  Therefore. "signals + threads == forbidden" simplifies to
"threads == forbidden", and if threads are forbidden then let's just say
Python loses a lot of support :)

Regarding async safety, I think that generally any system call that may
potentially suspend a process, such as blocking system calls and, most
notably, mutexes, are not safe to use in async handlers.  Writing to a pipe
is safe only because the file descriptor is in non-blocking mode.  OTOH, in
non-blocking mode the file descriptor can only accept a few bytes at a
time, hence the (documented) limitation of not accepting large bursts of
signals.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 16:59

Message:
Logged In: YES 
user_id=12364
Originator: NO

Unfortunately, neither the mutex functions nor pthread_kill() are listed
as async-signal-safe:
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

Personally, I'd be just as happy to raise an exception if an attempt is
made to import both signal and threading: doing it safely and reliably is
just too difficult, so we shouldn't promote a false sense of security.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 15:25

Message:
Logged In: YES 
user_id=21627
Originator: NO

Right. To prevent the simultaneous invocation of Py_AddPendingCall from
multiple threads, two alternatives are possible:
a) protect the routine with a thread mutex, if threading is available
b) use pthread_kill in threads other than the main thread (as I proposed
earlier); those other threads then wouldn't call Py_AddPendingCall anymore

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 15:11

Message:
Logged In: YES 
user_id=12364
Originator: NO

As far as I can tell, the sig_mask argument of sigaction only applies to
the thread in which the signal handler gets called.  If you have multiple
threads you could still have one signal handler running per thread.

http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 13:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals through
sa_mask. Then, no two signal handlers will get invoked simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 11:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 11:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 04:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 01:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 12:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 11:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 11:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 10:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 15:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 09:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 20:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 08:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 07:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-26 17:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Tue Jan 30 23:53:50 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 14:53:50 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HC1rK-0005Zn-Ve@sc8-sf-web11.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 16:13
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-30 23:53

Message:
Logged In: YES 
user_id=21627
Originator: NO

Whether pthread_kill is async-signal-safe depends on the operating system.
For example, on Solaris, it is documented as async-signal-safe.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-30 22:41

Message:
Logged In: YES 
user_id=12364
Originator: NO

Ouch, I forgot all about Ctrl-C.  It would have been an amusing if
short-lived "solution". ;)

The safety issue is not about suspending the process, which is safe albeit
not very useful.  The issue is libc (or even the kernel?) having internal
data structures in an inconsistent state and being unable to handle
reentrant or threaded access; random corruption, crashes, hangs, or any
sort of "undefined" behaviour can result if this is not heeded.  write() is
one of the precious few libc functions that is labeled as not vulnerable to
such things.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-30 11:48

Message:
Logged In: YES 
user_id=908
Originator: YES

Adam, forbidding signals + threads is not possible.  signalmodule is
_always_ imported by python; it is builtin.  That's because it is signal
module that provides support for raising KeyboardInterrupt when Ctrl-C is
pressed.  Therefore. "signals + threads == forbidden" simplifies to
"threads == forbidden", and if threads are forbidden then let's just say
Python loses a lot of support :)

Regarding async safety, I think that generally any system call that may
potentially suspend a process, such as blocking system calls and, most
notably, mutexes, are not safe to use in async handlers.  Writing to a pipe
is safe only because the file descriptor is in non-blocking mode.  OTOH, in
non-blocking mode the file descriptor can only accept a few bytes at a
time, hence the (documented) limitation of not accepting large bursts of
signals.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-30 00:59

Message:
Logged In: YES 
user_id=12364
Originator: NO

Unfortunately, neither the mutex functions nor pthread_kill() are listed
as async-signal-safe:
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

Personally, I'd be just as happy to raise an exception if an attempt is
made to import both signal and threading: doing it safely and reliably is
just too difficult, so we shouldn't promote a false sense of security.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 23:25

Message:
Logged In: YES 
user_id=21627
Originator: NO

Right. To prevent the simultaneous invocation of Py_AddPendingCall from
multiple threads, two alternatives are possible:
a) protect the routine with a thread mutex, if threading is available
b) use pthread_kill in threads other than the main thread (as I proposed
earlier); those other threads then wouldn't call Py_AddPendingCall anymore

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 23:11

Message:
Logged In: YES 
user_id=12364
Originator: NO

As far as I can tell, the sig_mask argument of sigaction only applies to
the thread in which the signal handler gets called.  If you have multiple
threads you could still have one signal handler running per thread.

http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 21:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals through
sa_mask. Then, no two signal handlers will get invoked simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 19:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 19:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 12:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 09:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 20:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 19:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 19:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 18:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 23:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 17:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-28 04:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 16:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 15:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 01:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 00:31:38 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 15:31:38 -0800
Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe
Message-ID: <E1HC2Ru-0001nB-L6@sc8-sf-web5.sourceforge.net>

Patches item #1564547, was opened at 2006-09-24 08:13
Message generated for change (Comment added) made by rhamphoryncus
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Gustavo J. A. M. Carneiro (gustavo)
Assigned to: Nobody/Anonymous (nobody)
Summary: Py_signal_pipe

Initial Comment:
Problem: how to wakeup extension modules running poll()
so that they can let python check for signals.

Solution: use a pipe to communicate between signal
handlers and main thread.  The read end of the pipe can
then be monitored by poll/select for input events and
wake up poll().  As a side benefit, it avoids the usage
of Py_AddPendingCall / Py_MakePendingCalls, which are
patently not "async safe".

All explained in this thread:

http://mail.python.org/pipermail/python-dev/2006-September/068569.html


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-30 16:31

Message:
Logged In: YES 
user_id=12364
Originator: NO

I've noticed that, but it does little to help us write portable code that
covers all of CPython's platforms.  Unless you have a collection of such
platform-specific async-signal-safe functions to cover them all?

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-30 15:53

Message:
Logged In: YES 
user_id=21627
Originator: NO

Whether pthread_kill is async-signal-safe depends on the operating system.
For example, on Solaris, it is documented as async-signal-safe.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-30 14:41

Message:
Logged In: YES 
user_id=12364
Originator: NO

Ouch, I forgot all about Ctrl-C.  It would have been an amusing if
short-lived "solution". ;)

The safety issue is not about suspending the process, which is safe albeit
not very useful.  The issue is libc (or even the kernel?) having internal
data structures in an inconsistent state and being unable to handle
reentrant or threaded access; random corruption, crashes, hangs, or any
sort of "undefined" behaviour can result if this is not heeded.  write() is
one of the precious few libc functions that is labeled as not vulnerable to
such things.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-30 03:48

Message:
Logged In: YES 
user_id=908
Originator: YES

Adam, forbidding signals + threads is not possible.  signalmodule is
_always_ imported by python; it is builtin.  That's because it is signal
module that provides support for raising KeyboardInterrupt when Ctrl-C is
pressed.  Therefore. "signals + threads == forbidden" simplifies to
"threads == forbidden", and if threads are forbidden then let's just say
Python loses a lot of support :)

Regarding async safety, I think that generally any system call that may
potentially suspend a process, such as blocking system calls and, most
notably, mutexes, are not safe to use in async handlers.  Writing to a pipe
is safe only because the file descriptor is in non-blocking mode.  OTOH, in
non-blocking mode the file descriptor can only accept a few bytes at a
time, hence the (documented) limitation of not accepting large bursts of
signals.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 16:59

Message:
Logged In: YES 
user_id=12364
Originator: NO

Unfortunately, neither the mutex functions nor pthread_kill() are listed
as async-signal-safe:
http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html

Personally, I'd be just as happy to raise an exception if an attempt is
made to import both signal and threading: doing it safely and reliably is
just too difficult, so we shouldn't promote a false sense of security.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 15:25

Message:
Logged In: YES 
user_id=21627
Originator: NO

Right. To prevent the simultaneous invocation of Py_AddPendingCall from
multiple threads, two alternatives are possible:
a) protect the routine with a thread mutex, if threading is available
b) use pthread_kill in threads other than the main thread (as I proposed
earlier); those other threads then wouldn't call Py_AddPendingCall anymore

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-29 15:11

Message:
Logged In: YES 
user_id=12364
Originator: NO

As far as I can tell, the sig_mask argument of sigaction only applies to
the thread in which the signal handler gets called.  If you have multiple
threads you could still have one signal handler running per thread.

http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 13:01

Message:
Logged In: YES 
user_id=21627
Originator: NO

I see. I think this can be fixed fairly easily: install the signal
handlers with sigaction, and prevent any nested delivery of signals through
sa_mask. Then, no two signal handlers will get invoked simultaneously.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 11:53

Message:
Logged In: YES 
user_id=908
Originator: YES

Py_AddPendingCall is not async safe.  It's obvious looking at the code,
and it even says so in a comment:

	/* XXX Begin critical section */
	/* XXX If you want this to be safe against nested
	   XXX asynchronous calls, you'll have to work harder! */


----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 11:36

Message:
Logged In: YES 
user_id=21627
Originator: NO

Can you please explain in what sense the current framework isn't "async
safe"? You might be referring to "async-signal-safe functions", which is a
term specified by POSIX, referring to functions that may be called in a
signal handler. The Python signal handler, signal_handler, calls these
functions:

* getpid
* Py_AddPendingCall
* PyOS_setsig
** sigemptyset
** sigaction

AFAICT, this is the complete list of functions called in a signal handler.
Of these, only getpid, sigemptyset, and sigaction are library functions,
and they are all specified as async-signal safe. So the current
implementation is async-signal safe.

Usage of pthread_kill wouldn't make it more platform-specific than your
patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So
both changes work on a POSIX system, and neither change would be portable
if all you have is standard C.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-29 04:07

Message:
Logged In: YES 
user_id=908
Originator: YES

But if you think about it, support for other cases have to be extensions
of this patch.  In an async handler it's not safe to do about anything. 
The current framework is not async safe, it just happens to work most of
the time.

If we use pthread_kill we will start to enter platform-specific code; what
will happen in systems without POSIX threads?  What signal do we use to
wake up the main thread?  Do system calls that receive signals return EINTR
for this platform or not (can we guarantee it always happens)?  Which one
is the main thread anyway?

In any case, anything we want to do can be layered on top of the
Py_signal_pipe API in a very safe way, because reading from a pipe is
decoupled from the async handler, therefore this handler is allowed to
safely do anything it wants, like pthread_kill.  But IMHO that part should
be left out of Python; let the frameworks do it themselves.

----------------------------------------------------------------------

Comment By: Martin v. L?wis (loewis)
Date: 2007-01-29 01:41

Message:
Logged In: YES 
user_id=21627
Originator: NO

I'm -1 on this patch. The introduction of a pipe makes it essentially
gtk-specific: It will only work with gtk (for a while, until other
frameworks catch up - which may take years), and it will only wake up a gtk
thread that is in the gtk poll call.

It fails to support cases where the main thread blocks in a different
blocking call (i.e. neither select nor poll). I think a better mechanism is
needed to support that case, e.g. by waking up the main thread with
pthread_kill.

----------------------------------------------------------------------

Comment By: Jp Calderone (kuran)
Date: 2007-01-25 12:22

Message:
Logged In: YES 
user_id=366566
Originator: NO

The attached patch also fixes a bug in the order in which signal handlers
are run.  Previously, they would be run in numerically ascending signal
number order.  With the patch attached, they will be run in the order they
are processed by Python.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2007-01-25 11:38

Message:
Logged In: YES 
user_id=12364
Originator: NO

gustavo, there's two patches attached and it's not entirely clear which
one is current.  Please delete the older one.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 11:11

Message:
Logged In: YES 
user_id=908
Originator: YES

File Added: python-signals.diff

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2007-01-25 10:57

Message:
Logged In: YES 
user_id=908
Originator: YES

Damn this SF bug tracker! ;(

The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and
also fixes http://www.python.org/sf/1643738

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-29 15:09

Message:
Logged In: YES 
user_id=12364

I'm concerned about the interface to
PyOS_InterruptOccurred().  The original version peeked ahead
for only that signal, and handled it manually.  No need to
report errors.  The new version will first call arbitrary
python functions to handle any earlier signals, then an
arbitrary python function for the interrupt itself, and then
will not report any errors they produce.  It may not even
get to the interrupt, even if one is waiting.

I'm not sure PyOS_InterruptOccurred() is called when
arbitrary python code is acceptable.  I suspect it should be
dropped entierly, in favour of a more robust API.

Otoh, some of it appears quite crufty.  One version in
intrcheck.c lacks a return statement, invoking undefined
behavior in C.

One other concern I have is that signalmodule.c should never
been unloaded, if loaded via dlopen.  A delayed signal
handler may reference it indefinitely.  However, I see no
sane way to enforce this.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-28 09:31

Message:
Logged In: YES 
user_id=908

> ...sizeof(char) will STILL return 1 in such a case...

Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more
readable than just a plain '1'.


----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-27 20:50

Message:
Logged In: YES 
user_id=12364

Any compiler where sizeof(char) != 1 is *deeply* broken.  In
C, a byte isn't always 8 bits (if it uses bits at all!). 
It's possible for a char to take (for instance) 32 bits, but
sizeof(char) will STILL return 1 in such a case.  A mention
of this in the wild is here:
http://lkml.org/lkml/1998/1/22/4
If you find a compiler that's broken, I'd love to hear about
it. :)

# error Too many signals to fit on an unsigned char!
Should be "in", not "on" :)

A comment in signal_handler() about ignoring the return
value of write() may be good.

initsignal() should avoid not replace
Py_signal_pipe/Py_signal_pipe_w if called a second time
(which is possible, right?).  If so, it should probably not
set them until after setting non-blocking mode.

check_signals() should not call
PyEval_CallObject(Handlers[signum].func, ...) if func is
NULL, which may happen after finisignal() clears it.

----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 08:34

Message:
Logged In: YES 
user_id=908

and of course this

 > * PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.

is correct, good catch.

New patch uploaded.


----------------------------------------------------------------------

Comment By: Gustavo J. A. M. Carneiro (gustavo)
Date: 2006-09-27 07:42

Message:
Logged In: YES 
user_id=908

> * Needs documentation ...

  True, I'll try to add more documentation...

> * I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.

I disagree.  Creating an array of size UCHAR_MAX is just
wasting memory.  If you check the original python code,
there's already fallback code to define NSIG if it's not
already defined (if not defined, it could end up being
defines as 64).

> * In signal_hander() sizeof(signum_c) is inherently 1. ;)

  And? I occasionally hear horror stories of platforms where
 sizeof(char) != 1, I'm not taking any chances :)

> * PyOS_InterruptOccurred() should probably still check
that it's called from the main thread.

check_signals already bails out if that is the case.  But in
fact it bails out without setting the interrupt_occurred
output parameter, so I fixed that.

fcntl error checking... will work on it.

----------------------------------------------------------------------

Comment By: Adam Olsen (rhamphoryncus)
Date: 2006-09-26 17:53

Message:
Logged In: YES 
user_id=12364

I've looked over the patch, although I haven't tested it.  I
have the following suggestions:

* Needs documentation explaining the signal weirdness (may
drop signals, may delay indefinitely, new handlers may get
signals intended for old, etc)
* Needs to be explicit that users must only poll/select to
check for readability of the pipe, NOT read from it
* The comment for is_tripped refers to sigcheck(), which
doesn't exist
* I think we should be more paranoid about the range of
possible signals.  NSIG does not appear to be defined by
SUSv2 (no clue about Posix).  We should size the Handlers
array to UCHAR_MAX and set any signals outside the range of
0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX.  I'm
not sure we should ever use NSIG.
* In signal_hander() sizeof(signum_c) is inherently 1. ;)
* The set_nonblock macro doesn't check for errors from
fcntl().  I'm not sure it's worth having a macro for that
anyway.
* Needs some documentation of the assumptions about
read()/write() being memory barriers.
* In check_signals() sizeof(signum) is inherently 1.
* There's a blank line with tabs near the end of
check_signals() ;)
* PyErr_SetInterrupt() should use a compile-time check for
SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped
out entierly.
* PyErr_SetInterrupt() needs to set is_tripped after the
call to write(), not before.
* PyOS_InterruptOccurred() should probably still check that
it's called from the main thread.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 05:49:52 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 20:49:52 -0800
Subject: [Patches] [ python-Patches-1648435 ] pty.fork() python
	implementation leaks the slave fd
Message-ID: <E1HC7Ps-0004lu-Lp@sc8-sf-web1.sourceforge.net>

Patches item #1648435, was opened at 2007-01-31 04:49
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648435&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Levon (movement)
Assigned to: Nobody/Anonymous (nobody)
Summary: pty.fork() python implementation leaks the slave fd

Initial Comment:
    master_fd, slave_fd = openpty()
    pid = os.fork()
    if pid == CHILD:
        ...
    # Parent and child process.
    return pid, master_fd

This is leaking 'slave_fd' in the parent.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648435&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 08:44:30 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 23:44:30 -0800
Subject: [Patches] [ python-Patches-1648435 ] pty.fork() python
	implementation leaks the slave fd
Message-ID: <E1HCA8s-0006Po-SI@sc8-sf-web11.sourceforge.net>

Patches item #1648435, was opened at 2007-01-31 04:49
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648435&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.4
>Status: Closed
>Resolution: Out of Date
Priority: 5
Private: No
Submitted By: John Levon (movement)
Assigned to: Nobody/Anonymous (nobody)
Summary: pty.fork() python implementation leaks the slave fd

Initial Comment:
    master_fd, slave_fd = openpty()
    pid = os.fork()
    if pid == CHILD:
        ...
    # Parent and child process.
    return pid, master_fd

This is leaking 'slave_fd' in the parent.

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-31 07:44

Message:
Logged In: YES 
user_id=849994
Originator: NO

Thanks for the report, this was already fixed in rev. 53146 on the trunk.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648435&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 08:49:22 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Tue, 30 Jan 2007 23:49:22 -0800
Subject: [Patches] [ python-Patches-1648435 ] pty.fork() python
	implementation leaks the slave fd
Message-ID: <E1HCADa-0006u4-MN@sc8-sf-web11.sourceforge.net>

Patches item #1648435, was opened at 2007-01-31 04:49
Message generated for change (Comment added) made by gbrandl
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648435&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.4
Status: Closed
Resolution: Out of Date
Priority: 5
Private: No
Submitted By: John Levon (movement)
Assigned to: Nobody/Anonymous (nobody)
Summary: pty.fork() python implementation leaks the slave fd

Initial Comment:
    master_fd, slave_fd = openpty()
    pid = os.fork()
    if pid == CHILD:
        ...
    # Parent and child process.
    return pid, master_fd

This is leaking 'slave_fd' in the parent.

----------------------------------------------------------------------

>Comment By: Georg Brandl (gbrandl)
Date: 2007-01-31 07:49

Message:
Logged In: YES 
user_id=849994
Originator: NO

Committed 2.5 branch rev. 53606.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-01-31 07:44

Message:
Logged In: YES 
user_id=849994
Originator: NO

Thanks for the report, this was already fixed in rev. 53146 on the trunk.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1648435&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 22:04:22 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 31 Jan 2007 13:04:22 -0800
Subject: [Patches] [ python-Patches-1649190 ] Adding support for _Bool to
	ctypes as c_bool
Message-ID: <E1HCMcw-0007Q6-Vp@sc8-sf-web11.sourceforge.net>

Patches item #1649190, was opened at 2007-01-31 21:04
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1649190&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Remahl (chmod007)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adding support for _Bool to ctypes as c_bool

Initial Comment:
Adds support for the C99 _Bool type to ctypes. Requires struct patch that has already been accepted in the python 2.6 train.

Updates ctypes C extension, ctypes __init__.py, ctypes documentation and ctypes tests.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1649190&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 23:05:56 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 31 Jan 2007 14:05:56 -0800
Subject: [Patches] [ python-Patches-1649190 ] Adding support for _Bool to
	ctypes as c_bool
Message-ID: <E1HCNaW-00063d-0Q@sc8-sf-web6.sourceforge.net>

Patches item #1649190, was opened at 2007-01-31 22:04
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1649190&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Modules
Group: Python 2.6
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Remahl (chmod007)
Assigned to: Nobody/Anonymous (nobody)
Summary: Adding support for _Bool to ctypes as c_bool

Initial Comment:
Adds support for the C99 _Bool type to ctypes. Requires struct patch that has already been accepted in the python 2.6 train.

Updates ctypes C extension, ctypes __init__.py, ctypes documentation and ctypes tests.

----------------------------------------------------------------------

>Comment By: Martin v. L?wis (loewis)
Date: 2007-01-31 23:05

Message:
Logged In: YES 
user_id=21627
Originator: NO

Thomas, can you take a look?

I think this needs to take into account Python versions without bool
support in the struct module, as the same ctypes code is used with older
Python releases as well.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1649190&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 23:38:29 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 31 Jan 2007 14:38:29 -0800
Subject: [Patches] [ python-Patches-1484758 ] cookielib: reduce (fatal)
	dependency on "beta" logging?
Message-ID: <E1HCO61-0002AA-9n@sc8-sf-web3.sourceforge.net>

Patches item #1484758, was opened at 2006-05-09 16:14
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1484758&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.4
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: kxroberto (kxroberto)
Assigned to: Nobody/Anonymous (nobody)
Summary: cookielib: reduce (fatal) dependency on "beta" logging?

Initial Comment:
The logging package is tagged "beta". Yet cookielib (as
the ONLY module in the std. lib !?) uses Logger.debug()
very excessively.

I got occasional nasty crash traces (from users) when
using cookielib Processors through urllib2
(multi-threaded usage) - see below.  The causes are not
errors in cookielib, but upon simple calls to
Logger.debug() : varying AttributeError's in logging,
which on the first glance seem to be impossible, as
those attributes are set in the related __init__()'s
but there are strange complex things going on with
roots/hierarchies/copy etc. so....  thread/lock
problems I'd guess.

the patch uncomments several debug() calls in cookielib
in import. only one's in important high-frequency
execution flow path (not ones upon errors and
exceptional states). And 2 minor fixes on pychecker
warnings.

After applying that, the nasty crash reports disappeared.

I do not understand completely why the cookielib
production code has to use the logging package
(expensive) at all. At least for the high-frq used
add_cookie_header its unnecessary. There could be some
simpler (detached) test code for testing purposes.
Importing the logging and setup is time consuming etc.
(see other patch for urllib2 import optimization. )

I'd recommend: At least as far as logging is "beta" and
cookielib NOT, all these debug()'s should be
uncommented, or at least called ONLY upon a dispatching
global 'use_logging' variable in cookielib, in case the
test code cannot be externalized nicely.


2 example error traces:

...File "cookielib.pyo",
line 1303, in add_cookie_header\\n\', \'  File
"logging\\\\__init__.pyo", line 878, in debug\\n\',
\'  File "logging\\\\__init__.pyo", line 1056, in
getEffectiveLevel\\n\', "AttributeError: Logger
instance has no attribute \'level\'\\n


...in http_request\\n\', \'  File "cookielib.pyo", line
1303, in add_cookie_header\\n\', \'  File
"logging\\\\__init__.pyo", line 876, in debug\\n\',
"AttributeError: Manager instance has no attribute
\'disable\'\\n


-robert

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-31 22:38

Message:
Logged In: YES 
user_id=261020
Originator: NO

Just adding some notes for the record since I didn't spot this patch at
the time

1. Deadlocks

Deadlocks when using threads with cookielib were almost *expected*,
because of cookielib and not because of module logging (at least before the
try/finally suites were added by a different patch from kxroberto recently
-- perhaps that patch fixes the threading problems). To my embarassment,
the thread locks in cookielib got into the Python 2.4 release pretty much
by accident when I got sick shortly before the first beta (search for
"untested" in the first link...):

http://mail.python.org/pipermail/python-dev/2004-May/044785.html
http://mail.python.org/pipermail/python-list/2005-January/304651.html


After the release, I suggested that all thread synchronisation be removed
(they are not present in non-stdlib modules ClientCookie/mechanize, which
share a common ancestor with cookielib, and that certainly does not rule
out threaded use), since threaded use was surely thoroughly broken -- or at
least add a prominent warning in the docs about the thread-broken-ness. 
The opinion was that the thread synchronisation should instead be fixed
(fair enough), so the locks were left in and the warning was not added.  I
didn't supply the obvious patch to add try/finally blocks, since it's not
obvious to me that that is sufficient (I don't mean to say here that it's
not sufficient -- I'm just not sure).


2. Why logging?

gbrandl: "As long as only one standard module uses logging, it's quite
useless."

The logging in cookielib has a concrete, practical purpose,  Writing web
client code very frequently involves working out why your code does not
precisely imitate a web browser; this unavoidably happens more often than
with other protocols, because web browsers are very complicated, so no web
client library implements all browser features (let alone correctly).  If
you think cookies should get returned to the server, and they don't,
turning on logging immediately tells you why.  This results in a big time
saving over debugging the code every tims this happens.  There is indeed an
*additional* benefit to be had by many modules all using module logging. 
That was often proposed at the time, so it seemed sensible to use module
logging rather than print statements.  It seems that the people who
suggested that have not implemented it.  Still, given the actual need for
logging in this case, the alternative was to invent something else that
does a similar job.  httplib does exactly that (it predates module
logging), using simple print statements and a flag to turn them on.  OTOH,
I have already heard complaints that httplib should be using module logging
instead, because that is convenient when using non-stdlib code that uses
module logging.

Re performance: has anybody actually *measured* a significant (that is,
problematic) performance impact caused by *this particular* module's use of
module logging?  If so, a performance hack could be added to cookielib.


----------------------------------------------------------------------

Comment By: Vinay Sajip (vsajip)
Date: 2006-05-18 08:38

Message:
Logged In: YES 
user_id=308438

I've updated the status of the logging package in Subversion 
from "beta" to "production". This seems reasonable, since 
the package has been part of Python since 2.3 ;-)

I would agree with Jim Jewett that the problems observed are 
likely to be general threading problems rather than bugs in 
logging - the latter are unlikely to present with symptoms 
such as those described.

----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-05-18 07:21

Message:
Logged In: YES 
user_id=849994

As long as only one standard module uses logging, it's quite
useless. And, its use doesn't even comply to PEP 337 ("py."
prefix). So if a student wants to implement PEP 337 in SoC,
he/she is welcome to do this consistently, and any obscure
logging bugs will certainly show up soon after that.

----------------------------------------------------------------------

Comment By: Jim Jewett (jimjjewett)
Date: 2006-05-17 22:34

Message:
Logged In: YES 
user_id=764593

(1)  I don't think logging should be removed from the 
stdlib.  At the very least, the reasoning should be added 
to PEP 337, which says to *add* logging to the standard 
library.  http://www.python.org/dev/peps/pep-0337/  (There 
will probably be a Summer Of Code student funded to do 
this; if it is a problem, lets fix the problem in the 
logging module.)

(2)  Logging isn't really as unstable as you seem to think 
Beta implies; it is probably more stable than the newer 
cookielib, let alone the combination of cookielib, urllib2, 
and Processors.  (John Lee has been making long-overdue 
fixes to urllib2 -- and processors in particular -- because 
he was the first to really understand it well enough; these 
fixes are generally triggered by immediate problems and may 
not be complete fixes.)

I will agree that it might make sense to remove the beta 
marker from the version of logging that is distributed in 
the stdlib.

(3)  What else was shipped with those applications which 
caused this?  Which version of logging did you have?

Both tracebacks could be caused if the root logger were not 
a normal logger (and its manager therefore not a normal 
manager).  Vinay has taken some steps to allow 3rd party 
libraries to override the class of even the root logger, 
but doing it *right* is fairly subtle.

Another possibility is that you got burned by threads 
allowing access to half-constructed loggers or managers, or 
by broken PlaceHolders/fixups in the manager.  Again, this 
can't happen unless someone is doing at least two dangerous 
things, but ... it has triggered a few of the changelog 
entries.


----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2006-05-17 15:46

Message:
Logged In: YES 
user_id=849994

Resolved with rev. 46027 by introducing a global "debug"
flag, like other libraries do.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1484758&group_id=5470

From noreply at sourceforge.net  Wed Jan 31 23:47:23 2007
From: noreply at sourceforge.net (SourceForge.net)
Date: Wed, 31 Jan 2007 14:47:23 -0800
Subject: [Patches] [ python-Patches-1587139 ] cookielib: lock
	acquire/release try..finally protected
Message-ID: <E1HCOEd-0007ES-CP@sc8-sf-web11.sourceforge.net>

Patches item #1587139, was opened at 2006-10-30 11:39
Message generated for change (Comment added) made by jjlee
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587139&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Library (Lib)
Group: Python 2.5
Status: Closed
Resolution: Accepted
Priority: 5
Private: No
Submitted By: kxroberto (kxroberto)
Assigned to: A.M. Kuchling (akuchling)
Summary: cookielib: lock acquire/release try..finally protected

Initial Comment:
Almost all code between lock.acquire and .relase in
cookielib was unprotected by a try..finally bracket.
I suspect some deadlocks here to have to do with that.
This patch against latest version (2.5) in SVN corrects it.


Another minor change request:

at the end of cookielib.py these 2 special
CookieJar-modules are just imported without being
referenced/used in cookielib:

 from _LWPCookieJar import LWPCookieJar, lwp_cookie_str
 from _MozillaCookieJar import MozillaCookieJar

Maybe that should be removed from cookielib in order to
further reduce the slow import and leave the (rarely
used) import to the user (as with BSDJar etc. too ...)

robert

----------------------------------------------------------------------

Comment By: John J Lee (jjlee)
Date: 2007-01-31 22:47

Message:
Logged In: YES 
user_id=261020
Originator: NO

FWIW, I added a note about the history of this issue here:

http://python.org/sf/1484758


----------------------------------------------------------------------

Comment By: A.M. Kuchling (akuchling)
Date: 2006-12-19 15:44

Message:
Logged In: YES 
user_id=11375
Originator: NO

Thanks for your patch!  Applied to the trunk in rev. 53073.

We probably can't remove the imports at the end because they're for
preserving backward compatibility with
earlier versions of cookielib that contained those classes.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587139&group_id=5470