From noreply at sourceforge.net Mon Jan 1 07:25:14 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 31 Dec 2006 22:25:14 -0800 Subject: [Patches] [ python-Patches-1620174 ] Improve platform.py usability on Windows Message-ID: Patches item #1620174, was opened at 2006-12-21 22:49 Message generated for change (Comment added) made by infidel You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Luke Dunstan (infidel) Assigned to: M.-A. Lemburg (lemburg) Summary: Improve platform.py usability on Windows Initial Comment: This patch modifies platform.py to remove most of the dependencies on pywin32, and use the standard ctypes and _winreg modules instead. It also adds support for Windows CE. ---------------------------------------------------------------------- >Comment By: Luke Dunstan (infidel) Date: 2007-01-01 14:25 Message: Logged In: YES user_id=30442 Originator: YES Why does platform.py need to be compatible with earlier versions of Python? The return types haven't changed, and I think the return values won't change because the same OS APIs are being used. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-01 02:49 Message: Logged In: YES user_id=38388 Originator: NO I haven't looked at the patch yet, so just a few general comments on changes to platform.py: * the code must continue to work with Python versions prior to 2.6 This means that ctypes and _winreg support may be added as an option, but removing pywin32 calls is not the right way to proceed. * changes in return type of the public and documented APIs are not possible If you have a need for more information, then a new API should be added, or the information merged into one of the existing return fields. * changes in the return values of APIs due to use of different OS APIs must be avoided There's code out there relying on the return values, so if in doubt a new API must be provided. ---------------------------------------------------------------------- Comment By: Luke Dunstan (infidel) Date: 2006-12-31 13:57 Message: Logged In: YES user_id=30442 Originator: YES 1. Yes this is intended for 2.6 2. The only difference between win32api.RegQueryValueEx and _winreg.QueryValueEx seems to be that the latter returns Unicode strings. I have adjusted the patch to be more compatible with the old behaviour. 3. I have updated the doc string in the new patch. File Added: platform-wince-2.diff ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-31 08:13 Message: Logged In: YES user_id=764593 Originator: NO ( win32api.RegQueryValueEx is _winreg.QueryValueEx ) ? If not, it should wait for 2.6, and there should be an entry in what's new. (I suppose similar concerns exist for other return classes.) The change to win32_ver only half-corrects the return type to the four-tuple. The meaning of release (even if it is just "release name") should be specified in the text. def win32_ver(release='',version='',csd='',ptype=''): """ Get additional version information from the Windows Registry - and return a tuple (version,csd,ptype) referring to version + and return a tuple (release,version,csd,ptype) referring to version number, CSD level and OS type (multi/single processor). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470 From noreply at sourceforge.net Wed Jan 3 00:50:23 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 02 Jan 2007 15:50:23 -0800 Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception attributes Message-ID: Patches item #1626538, was opened at 2007-01-02 18:50 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Nobody/Anonymous (nobody) Summary: update to PEP 344 - exception attributes Initial Comment: PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception. The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release. This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 From noreply at sourceforge.net Wed Jan 3 00:56:35 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 02 Jan 2007 15:56:35 -0800 Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception attributes Message-ID: Patches item #1626538, was opened at 2007-01-02 18:50 Message generated for change (Comment added) made by jimjjewett You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) >Assigned to: Ka-Ping Yee (ping) Summary: update to PEP 344 - exception attributes Initial Comment: PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception. The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release. This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution. ---------------------------------------------------------------------- >Comment By: Jim Jewett (jimjjewett) Date: 2007-01-02 18:56 Message: Logged In: YES user_id=764593 Originator: YES http://mail.python.org/pipermail/python-3000/2007-January/005322.html Guido said he could check it in if Ping agrees, so I'm assigning the patch to ping (who I *hope* is Ka-Ping Yee) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 From noreply at sourceforge.net Wed Jan 3 01:00:22 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 02 Jan 2007 16:00:22 -0800 Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception attributes Message-ID: Patches item #1626538, was opened at 2007-01-02 18:50 Message generated for change (Comment added) made by jimjjewett You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Ka-Ping Yee (ping) Summary: update to PEP 344 - exception attributes Initial Comment: PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception. The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release. This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution. ---------------------------------------------------------------------- >Comment By: Jim Jewett (jimjjewett) Date: 2007-01-02 19:00 Message: Logged In: YES user_id=764593 Originator: YES File Added: pep344diff.txt ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-02 18:56 Message: Logged In: YES user_id=764593 Originator: YES http://mail.python.org/pipermail/python-3000/2007-January/005322.html Guido said he could check it in if Ping agrees, so I'm assigning the patch to ping (who I *hope* is Ka-Ping Yee) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 From noreply at sourceforge.net Wed Jan 3 01:30:33 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 02 Jan 2007 16:30:33 -0800 Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception attributes Message-ID: Patches item #1626538, was opened at 2007-01-02 15:50 Message generated for change (Comment added) made by ping You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Ka-Ping Yee (ping) Summary: update to PEP 344 - exception attributes Initial Comment: PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception. The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release. This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution. ---------------------------------------------------------------------- >Comment By: Ka-Ping Yee (ping) Date: 2007-01-02 16:30 Message: Logged In: YES user_id=45338 Originator: NO Okay, it will take me a moment to page this back into my head and respond. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-02 16:00 Message: Logged In: YES user_id=764593 Originator: YES File Added: pep344diff.txt ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-02 15:56 Message: Logged In: YES user_id=764593 Originator: YES http://mail.python.org/pipermail/python-3000/2007-January/005322.html Guido said he could check it in if Ping agrees, so I'm assigning the patch to ping (who I *hope* is Ka-Ping Yee) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 From noreply at sourceforge.net Wed Jan 3 16:21:36 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 07:21:36 -0800 Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at all Message-ID: Patches item #1627052, was opened at 2007-01-03 10:21 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Nobody/Anonymous (nobody) Summary: backticks will not be used at all Initial Comment: In python 3, backticks will not mean repr. Every few months, someone suggests a new meaning for them. This clarifies that they won't be reused at all. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 From noreply at sourceforge.net Wed Jan 3 16:22:51 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 07:22:51 -0800 Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at all Message-ID: Patches item #1627052, was opened at 2007-01-03 10:21 Message generated for change (Settings changed) made by jimjjewett You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) >Assigned to: Georg Brandl (gbrandl) Summary: backticks will not be used at all Initial Comment: In python 3, backticks will not mean repr. Every few months, someone suggests a new meaning for them. This clarifies that they won't be reused at all. ---------------------------------------------------------------------- >Comment By: Jim Jewett (jimjjewett) Date: 2007-01-03 10:22 Message: Logged In: YES user_id=764593 Originator: YES Assigning to PEP owner, Georg. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 From noreply at sourceforge.net Thu Jan 4 00:46:07 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 15:46:07 -0800 Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does not close sockets properly) Message-ID: Patches item #1627441, was opened at 2007-01-03 23:46 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for #1601399 (urllib2 does not close sockets properly) Initial Comment: Fix for #1601399 Definitely a backport candidate. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 From noreply at sourceforge.net Thu Jan 4 03:53:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 18:53:30 -0800 Subject: [Patches] [ python-Patches-1626538 ] update to PEP 344 - exception attributes Message-ID: Patches item #1626538, was opened at 2007-01-02 15:50 Message generated for change (Comment added) made by ping You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Ka-Ping Yee (ping) Summary: update to PEP 344 - exception attributes Initial Comment: PEP 344 proposes adding __traceback__, __context__, and __cause__ attributes to Exception. The primary objection has been that the __traceback__ exception would cause a cycle, which would delay resource release. This objection is now added to the PEP, along with some details about why it is a problem, and why weakrefs aren't a straightforward solution. ---------------------------------------------------------------------- >Comment By: Ka-Ping Yee (ping) Date: 2007-01-03 18:53 Message: Logged In: YES user_id=45338 Originator: NO I've checked in this change. Thanks for writing the patch. ---------------------------------------------------------------------- Comment By: Ka-Ping Yee (ping) Date: 2007-01-02 16:30 Message: Logged In: YES user_id=45338 Originator: NO Okay, it will take me a moment to page this back into my head and respond. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-02 16:00 Message: Logged In: YES user_id=764593 Originator: YES File Added: pep344diff.txt ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-02 15:56 Message: Logged In: YES user_id=764593 Originator: YES http://mail.python.org/pipermail/python-3000/2007-January/005322.html Guido said he could check it in if Ping agrees, so I'm assigning the patch to ping (who I *hope* is Ka-Ping Yee) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1626538&group_id=5470 From noreply at sourceforge.net Thu Jan 4 04:59:16 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 19:59:16 -0800 Subject: [Patches] [ python-Patches-1624059 ] fast subclasses of builtin types Message-ID: Patches item #1624059, was opened at 2006-12-29 01:01 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Neal Norwitz (nnorwitz) Assigned to: Guido van Rossum (gvanrossum) Summary: fast subclasses of builtin types Initial Comment: This is similar to a patch posted on python-dev a few months ago (or more). I modified it to also handle subclassing exceptions which should speed up exception handling a bit. (This was proposed by Guido based on the original patch.) I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-03 22:59 Message: Logged In: YES user_id=6380 Originator: NO This looks fine, but I have some questions about alternative implementations: - Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact() before calling the fast bit checking macro? Did you measure that this is in fact faster? True, it means always a pointer deref, so maybe it is -- but OTOH it is more instructions. - Why not have a separate bit for each type? Then you could make the fast macro test for (flags & mask) != 0 instead of testing for (flag & mask) == value. It would use up all the remaining bits, but I suspect there are some unused (or reusable) bits in lower positions: 1L<<2 is unused (was GC), and 1L<<11 also seems unused. And bits 18 through 23! And I'm guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they were introduced in 2.0... So it really looks like you have plenty of bits. Of course I don't know if it matters; would be worth it perhaps to look at the machine code. - Oops, it looks like your comment is off. You claim to be using bits 24-27, leaving 28-31 free, but in fact you're using bits 28-31! BTW You're inroducing quite a few lines over 80 chars. Perhaps cut back a bit? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-29 01:04 Message: Logged In: YES user_id=33168 Originator: YES I forgot to mention this patch works by using unused bits in tp_flags. This saves a function call when checking for a subclass of a builtin type. There's one funky thing about this patch, the change to Objects/exceptions.c. I didn't investigate why this was necessary, or more likely I did why when I added it and forgot. I know that without adding BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470 From noreply at sourceforge.net Thu Jan 4 05:30:54 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 20:30:54 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 15:53 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-03 23:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 01:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-27 22:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-27 20:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-27 20:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-27 19:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 14:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 10:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 16:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 15:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 15:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 17:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 13:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 04:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 03:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-19 20:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 12:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-03 20:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Thu Jan 4 05:57:08 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 20:57:08 -0800 Subject: [Patches] [ python-Patches-1548388 ] set comprehensions Message-ID: Patches item #1548388, was opened at 2006-08-29 04:33 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1548388&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Georg Brandl (gbrandl) Assigned to: Georg Brandl (gbrandl) Summary: set comprehensions Initial Comment: This is a big one: * cleanup grammar; unifies listcomp/genexp grammar which means that [x for x in 1, 2] is no longer valid * cleanup comprehension compiling code (unifies all AST code for the three comprehensions and most of the compile.c code) * add set comprehensions This patch modifies list comprehensions to be implemented more like generator expressions: in a separate function, which means that the loop variables will not leak any more. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-03 23:57 Message: Logged In: YES user_id=6380 Originator: NO There was some discussion on the py3k list about Raymond's suggestion. Are you thinking of doing that? I'd really like to see the syntactic changes and additions from this patch, but I agree that for list/set comps we can do without the extra stack frame. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-09-08 13:13 Message: Logged In: YES user_id=80475 Genexps necessarily need a separate stack frame to achieve saved execution state (including the instruction pointer and local variable). Also, it was simplest to implement genexps in terms of the existing and proven code for regular generators. For list and set comps, I think you can take a simpler approach and just rename the inner loop variable to something invisible. That will make it faster, make the disassemby readable, and make it easier to follow in pdb. Also, we get to capitalize on proven code -- they only difference is that the induction variable won't be visible to surrounding code. Since what you have works, I would say just check it in; however, it would probably never get touched again and an early, arbitrary design choice would get set in stone. My bet is that the renaming approach will result in a much simpler patch. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-09-06 11:36 Message: Logged In: YES user_id=6380 I always assumed that the genexps *require* being a function because that's the only way to create a generator. But that argument doesn't apply to listcomps. That's about all I know of the implementation of these.. :-( Have you asked python-dev? ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-09-06 03:03 Message: Logged In: YES user_id=849994 It is complete, it works and it does not leak the loop variable(s). The question is whether it is okay for listcomps and setcomps to be in their own anonymous function, which slows listcomps down compared to the 2.x branch. I don't know why the function approach was taken for genexps, but I suspect it was because who implemented it then saw this as the best way to hide the loop variable. Perhaps somebody else more familiar with the internals and the previous discussions can look over it. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-09-06 02:48 Message: Logged In: YES user_id=6380 Do you think this is ready to be checked in, or are you still working on it? ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-09-01 05:38 Message: Logged In: YES user_id=849994 Since you can put anything usable as an assignment target after the "for" of a listcomp, just renaming might be complicated. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-08-31 19:40 Message: Logged In: YES user_id=6380 +1. Would this cause problems for abominations like this though? >>> a=[1] >>> list(tuple(a) for a[0] in "abc") [('a',), ('b',), ('c',)] >>> a ['c'] >>> ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-08-31 19:15 Message: Logged In: YES user_id=80475 Would it be an oversimplfication for list and set comps to keep everything in one code block and just hide the list loop variables by renaming them: x --> __[x] That approach would only require a minimal patch, and it would make for a cleaner disassembly. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-08-31 15:55 Message: Logged In: YES user_id=849994 Attaching slightly revised patch and bytecode comparison. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-08-29 18:30 Message: Logged In: YES user_id=80475 Can you post a before and disassembly of some list and set comprehensions. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-08-29 15:09 Message: Logged In: YES user_id=849994 test_compiler and test_transformer fail because the compiler package hasn't been updated yet. test_dis fails because list comprehensions now generate different bytecode. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-08-29 13:59 Message: Logged In: YES user_id=6380 Nice! I see failures in 4 tests: test_compiler test_dis test_transformer test_univnewlines test_univnewlines is trivial (it's deleting a variable leaked out of a list comprehension); haven't looked at the rest in detail ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-08-29 04:34 Message: Logged In: YES user_id=849994 The previously attached patch contains only the important files. The FULL patch (attached now) also contains syntax fixes in python files so that the test suite is mostly passing. Note that the compiler package isn't ready yet. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1548388&group_id=5470 From noreply at sourceforge.net Thu Jan 4 06:17:01 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 21:17:01 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 20:53 Message generated for change (Comment added) made by tonylownds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 05:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 06:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 03:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 01:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 01:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 00:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 19:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 15:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 21:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 20:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 20:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 22:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 18:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 09:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 08:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-20 01:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 17:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-04 01:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Thu Jan 4 06:22:47 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 21:22:47 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 15:53 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 00:22 Message: Logged In: YES user_id=6380 Originator: NO Well, it depends on the context whether that matters. The kw-only args could just be included in the positional args (which have names anyway) and that wouldn't be so bad for some apps. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 00:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-03 23:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 01:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-27 22:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-27 20:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-27 20:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-27 19:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 14:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 10:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 16:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 15:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 15:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 17:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 13:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 04:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 03:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-19 20:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 12:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-03 20:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Thu Jan 4 07:55:56 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 22:55:56 -0800 Subject: [Patches] [ python-Patches-1494140 ] Documentation for new Struct object Message-ID: Patches item #1494140, was opened at 2006-05-24 02:26 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1494140&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 2.5 Status: Open Resolution: None Priority: 6 Private: No Submitted By: Bob Ippolito (etrepum) Assigned to: Nobody/Anonymous (nobody) Summary: Documentation for new Struct object Initial Comment: The performance enhancements to the struct module (patch #1493701) are implemented by having a Struct object, which is a compiled structure. This text file documents these new struct objects. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-03 22:55 Message: Logged In: YES user_id=33168 Originator: NO Even if this only documents part of the API, it seems like it would be better to get that in and finish it off later. Anyone know what's going on with this? ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-10-29 01:28 Message: Logged In: YES user_id=849994 What's the status of this? It should have been in 2.5 final... ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-08-02 00:38 Message: Logged In: YES user_id=849994 New/renamed functions need a \versionadded/changed. For StructObjects, I'd suggest a sentence like "Struct objects are new in version 2.5" at the top of the section. There's no explanation how to create a Struct object. The constructor must be explained, preferably on the module overview page. Isn't the type name "Struct"? ---------------------------------------------------------------------- Comment By: George Yoshida (quiver) Date: 2006-07-30 10:33 Message: Logged In: YES user_id=671362 > Does this patch still need to be updated for pack_to() I suppose so and hence updated my patch. (1) document pack_into(pack_to is renamed to pack_into). (2) document pack_into/pack_from as module functions too(just like re module) As for the function name change, I've already updated "what's new in 2.5" in r50985. I guess the patch is ready to be applied. Reviews are appreciated. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2006-07-29 12:28 Message: Logged In: YES user_id=11375 Does this patch still need to be updated for pack_to(), or can it just be applied? ---------------------------------------------------------------------- Comment By: George Yoshida (quiver) Date: 2006-07-10 10:26 Message: Logged In: YES user_id=671362 Patch for the TeX style doc. Bob, can you work on updating the main section right after 2.5 b2? ---------------------------------------------------------------------- Comment By: Bob Ippolito (etrepum) Date: 2006-05-26 06:05 Message: Logged In: YES user_id=139309 We're going to need to revise this patch some more to document the new pack_to function (for Martin Blais' hotbuf work) Additionally we'll probably also want to revise the main struct documentation to talk about bounds checking and avoiding the creation of long objects. ---------------------------------------------------------------------- Comment By: Bob Ippolito (etrepum) Date: 2006-05-25 07:32 Message: Logged In: YES user_id=139309 That's clearly a typo. I've attached a new version of the patch that removes those two letters. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-05-24 14:03 Message: Logged In: YES user_id=764593 Shouldn't self.size be the number of bytes required to *pack * the structure? The number required to *unpack* seems like it ought to include tuple overhead and such... ---------------------------------------------------------------------- Comment By: Bob Ippolito (etrepum) Date: 2006-05-24 08:35 Message: Logged In: YES user_id=139309 New patch attached, fixed unpack documentation, added unpack_from method. ---------------------------------------------------------------------- Comment By: Bob Ippolito (etrepum) Date: 2006-05-24 07:54 Message: Logged In: YES user_id=139309 Hold up on this patch, I need to revise it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1494140&group_id=5470 From noreply at sourceforge.net Thu Jan 4 08:12:16 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 03 Jan 2007 23:12:16 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 20:53 Message generated for change (Comment added) made by tonylownds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 07:12 Message: Logged In: YES user_id=24100 Originator: YES For getargs and getargvalues, including the names in positional args is an excellent strategy. There are uses (in cgitb) in the stdlib for getargvalues that then wouldn't need to be changed. The 2 uses of getargspec in the stdlib (one of which I missed, in DocXMLRPCServer) are both closely followed by formatargspec. I think those APIs should change or information will be lost. Alternatively, a new function (hopefully with a better name than getfullargspec :) could be made and getargspec could retain its API, but raise an error when keyword-only arguments are present. def getargspec(func): args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann = getfullargspec() if kwonlyargs: raise ValueError, "function has keyword-only arguments, use getfullargspec!" return args, varargs, varkw, defaults I'll update the patch to fix getargvalues and DocXMLRPCServer this weekend. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 05:22 Message: Logged In: YES user_id=6380 Originator: NO Well, it depends on the context whether that matters. The kw-only args could just be included in the positional args (which have names anyway) and that wouldn't be so bad for some apps. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 05:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 06:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 03:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 01:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 01:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 00:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 19:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 15:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 21:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 20:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 20:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 22:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 18:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 09:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 08:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-20 01:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 17:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-04 01:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Thu Jan 4 18:53:56 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 04 Jan 2007 09:53:56 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 15:53 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 12:53 Message: Logged In: YES user_id=6380 Originator: NO I like the following approach: (1) the old API continues to work for all functions, but provides incomplete information (not losing the kw-only args completely, but losing the fact that they are kw-only); (2) add a new API that provides all the relevant information. Maybe the new API should not return a 7-tuple but rather a structure with named attributes; that makes it more future-proof. Sorry, I don't have any good suggestions for new names. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 02:12 Message: Logged In: YES user_id=24100 Originator: YES For getargs and getargvalues, including the names in positional args is an excellent strategy. There are uses (in cgitb) in the stdlib for getargvalues that then wouldn't need to be changed. The 2 uses of getargspec in the stdlib (one of which I missed, in DocXMLRPCServer) are both closely followed by formatargspec. I think those APIs should change or information will be lost. Alternatively, a new function (hopefully with a better name than getfullargspec :) could be made and getargspec could retain its API, but raise an error when keyword-only arguments are present. def getargspec(func): args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann = getfullargspec() if kwonlyargs: raise ValueError, "function has keyword-only arguments, use getfullargspec!" return args, varargs, varkw, defaults I'll update the patch to fix getargvalues and DocXMLRPCServer this weekend. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 00:22 Message: Logged In: YES user_id=6380 Originator: NO Well, it depends on the context whether that matters. The kw-only args could just be included in the positional args (which have names anyway) and that wouldn't be so bad for some apps. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 00:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-03 23:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 01:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-27 22:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-27 20:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-27 20:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-27 19:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 14:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 10:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 16:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 15:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 15:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 17:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 13:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 04:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 03:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-19 20:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 12:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-03 20:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Thu Jan 4 18:56:38 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 04 Jan 2007 09:56:38 -0800 Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have TortoiseSVN but no .svn/* Message-ID: Patches item #1628061, was opened at 2007-01-04 17:56 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Win32: Fix build when you have TortoiseSVN but no .svn/* Initial Comment: Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds. This is done by compiling a short C file, make_buildinfo.c, and running the result. make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c. The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified. However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created. If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c. When it fails in this way I see this inscrutable message in the log: "C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c SubWCRev : Path '..' ends in '..', which is unsupported for this operation This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step. If getbuildinfo2.c exists, it returns true, else it returns false. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 From noreply at sourceforge.net Thu Jan 4 18:59:49 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 04 Jan 2007 09:59:49 -0800 Subject: [Patches] [ python-Patches-1628062 ] Win32: Add bytesobject.c to pythoncore.vcproj Message-ID: Patches item #1628062, was opened at 2007-01-04 17:59 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Win32: Add bytesobject.c to pythoncore.vcproj Initial Comment: Objects/bytesobject.c is a new C source in the distribution, and pythoncore won't build properly without it. This patch adds it for VC7. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470 From noreply at sourceforge.net Thu Jan 4 22:37:50 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 04 Jan 2007 13:37:50 -0800 Subject: [Patches] [ python-Patches-1628205 ] socket.readline() interface doesn't handle EINTR properly Message-ID: Patches item #1628205, was opened at 2007-01-04 13:37 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Maxim Sobolev (sobomax) Assigned to: Nobody/Anonymous (nobody) Summary: socket.readline() interface doesn't handle EINTR properly Initial Comment: The socket.readline() interface doesn't handle EINTR properly. Currently, when EINTR received exception is not handled and all data that has been in the buffer is lost. There is no way to recover that data from the code that uses the interface. Correct behaviour would be to catch EINTR and restart recv(). Patch is attached. Following is the real world example of how it affects httplib module: File "/usr/local/lib/python2.4/xmlrpclib.py", line 1096, in __call__ return self.__send(self.__name, args) File "/usr/local/lib/python2.4/xmlrpclib.py", line 1383, in __request verbose=self.__verbose File "/usr/local/lib/python2.4/xmlrpclib.py", line 1131, in request errcode, errmsg, headers = h.getreply() File "/usr/local/lib/python2.4/httplib.py", line 1137, in getreply response = self._conn.getresponse() File "/usr/local/lib/python2.4/httplib.py", line 866, in getresponse response.begin() File "/usr/local/lib/python2.4/httplib.py", line 336, in begin version, status, reason = self._read_status() File "/usr/local/lib/python2.4/httplib.py", line 294, in _read_status line = self.fp.readline() File "/usr/local/lib/python2.4/socket.py", line 325, in readline data = recv(1) error: (4, 'Interrupted system call') -Maxim ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470 From noreply at sourceforge.net Fri Jan 5 02:10:07 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 04 Jan 2007 17:10:07 -0800 Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have TortoiseSVN but no .svn/* Message-ID: Patches item #1628061, was opened at 2007-01-04 18:56 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Win32: Fix build when you have TortoiseSVN but no .svn/* Initial Comment: Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds. This is done by compiling a short C file, make_buildinfo.c, and running the result. make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c. The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified. However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created. If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c. When it fails in this way I see this inscrutable message in the log: "C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c SubWCRev : Path '..' ends in '..', which is unsupported for this operation This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step. If getbuildinfo2.c exists, it returns true, else it returns false. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-05 02:10 Message: Logged In: YES user_id=21627 Originator: NO This patch shouldn't be necessary. make_buildinfo2 checks whether there is a .svn subdirectory, and if there is none, it compiles getbuildinfo.c (just like when subwcrev.exe wasn't found). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 From noreply at sourceforge.net Fri Jan 5 02:50:51 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 04 Jan 2007 17:50:51 -0800 Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have TortoiseSVN but no .svn/* Message-ID: Patches item #1628061, was opened at 2007-01-04 17:56 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Win32: Fix build when you have TortoiseSVN but no .svn/* Initial Comment: Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds. This is done by compiling a short C file, make_buildinfo.c, and running the result. make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c. The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified. However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created. If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c. When it fails in this way I see this inscrutable message in the log: "C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c SubWCRev : Path '..' ends in '..', which is unsupported for this operation This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step. If getbuildinfo2.c exists, it returns true, else it returns false. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-05 01:50 Message: Logged In: YES user_id=364875 Originator: YES Good point. I seem to have goofed up my directory in a very specific way: when I made a copy of the tree, I explicitly did *not* copy the top-level .svn, but I forgot to do anything about the .svn directories in the subdirectories. make_buildinfo is run from the "PCbuild" directory, which still has a ".svn" directory, so the _stat(".svn") call succeeds. But the call to SubWCRev.exe fails because ".." (aka the Python root) doesn't have a ".svn" directory. I assert that the patch won't hurt anything, and will make the build process slightly more tolerant of goof-ups like me. If you prefer, I could submit an alternate patch where the current directory is the Python root and it writes to "PCbuild/getbuildinfo2.c". Or one where the stat checks for "../.svn" instead. Or if you don't want any patch at all, that works too, just close the patch. In the meantime, I'll clean up my build tree. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-05 01:10 Message: Logged In: YES user_id=21627 Originator: NO This patch shouldn't be necessary. make_buildinfo2 checks whether there is a .svn subdirectory, and if there is none, it compiles getbuildinfo.c (just like when subwcrev.exe wasn't found). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 From noreply at sourceforge.net Fri Jan 5 16:14:23 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 05 Jan 2007 07:14:23 -0800 Subject: [Patches] [ python-Patches-1520904 ] Fix tests that assume they can write to Lib/test Message-ID: Patches item #1520904, was opened at 2006-07-11 20:53 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tests Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Douglas Greiman (dgreiman) Assigned to: Nobody/Anonymous (nobody) Summary: Fix tests that assume they can write to Lib/test Initial Comment: A number of bsddb tests, as well as test_tarfile, create temporary files in Lib/ or {prefix}/lib/pythonX.Y/ . This change uses tempfile.gettempdir() instead. Tested on RedHat 9.0 Linux on x86. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2007-01-05 10:14 Message: Logged In: YES user_id=11375 Originator: NO Can you clarify in what cases test_tarfile writes to the current directory? ---------------------------------------------------------------------- Comment By: Matt Fleming (splitscreen) Date: 2006-08-31 07:07 Message: Logged In: YES user_id=1126061 This looks fine to me, and a worthwhile change. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470 From noreply at sourceforge.net Fri Jan 5 16:52:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 05 Jan 2007 07:52:11 -0800 Subject: [Patches] [ python-Patches-1520904 ] Fix tests that assume they can write to Lib/test Message-ID: Patches item #1520904, was opened at 2006-07-11 20:53 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tests Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Douglas Greiman (dgreiman) Assigned to: Nobody/Anonymous (nobody) Summary: Fix tests that assume they can write to Lib/test Initial Comment: A number of bsddb tests, as well as test_tarfile, create temporary files in Lib/ or {prefix}/lib/pythonX.Y/ . This change uses tempfile.gettempdir() instead. Tested on RedHat 9.0 Linux on x86. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2007-01-05 10:52 Message: Logged In: YES user_id=11375 Originator: NO Committed the Lib/bsddb changes to the trunk in rev. 53264; thanks! That leaves only the tarfile change to commit, but I'd like to understand why it's necessary first. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2007-01-05 10:14 Message: Logged In: YES user_id=11375 Originator: NO Can you clarify in what cases test_tarfile writes to the current directory? ---------------------------------------------------------------------- Comment By: Matt Fleming (splitscreen) Date: 2006-08-31 07:07 Message: Logged In: YES user_id=1126061 This looks fine to me, and a worthwhile change. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1520904&group_id=5470 From noreply at sourceforge.net Sat Jan 6 08:36:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 05 Jan 2007 23:36:11 -0800 Subject: [Patches] [ python-Patches-1628062 ] Win32: Add bytesobject.c to pythoncore.vcproj Message-ID: Patches item #1628062, was opened at 2007-01-04 17:59 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 3000 Status: Open Resolution: None >Priority: 7 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Win32: Add bytesobject.c to pythoncore.vcproj Initial Comment: Objects/bytesobject.c is a new C source in the distribution, and pythoncore won't build properly without it. This patch adds it for VC7. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-06 07:36 Message: Logged In: YES user_id=364875 Originator: YES Bumping the priority so it gets noticed. Fixing the build is mom-and-apple pie stuff, and getting this patch into the official tree will make my life a little more pleasant. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470 From noreply at sourceforge.net Sat Jan 6 10:37:36 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 01:37:36 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sat Jan 6 14:49:34 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 05:49:34 -0800 Subject: [Patches] [ python-Patches-1628062 ] Win32: Add bytesobject.c to pythoncore.vcproj Message-ID: Patches item #1628062, was opened at 2007-01-04 18:59 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 3000 >Status: Closed >Resolution: Accepted Priority: 7 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Win32: Add bytesobject.c to pythoncore.vcproj Initial Comment: Objects/bytesobject.c is a new C source in the distribution, and pythoncore won't build properly without it. This patch adds it for VC7. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-06 14:49 Message: Logged In: YES user_id=21627 Originator: NO Thanks for the patch. Committed as r53289. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-06 08:36 Message: Logged In: YES user_id=364875 Originator: YES Bumping the priority so it gets noticed. Fixing the build is mom-and-apple pie stuff, and getting this patch into the official tree will make my life a little more pleasant. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628062&group_id=5470 From noreply at sourceforge.net Sat Jan 6 14:50:50 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 05:50:50 -0800 Subject: [Patches] [ python-Patches-1628061 ] Win32: Fix build when you have TortoiseSVN but no .svn/* Message-ID: Patches item #1628061, was opened at 2007-01-04 18:56 Message generated for change (Settings changed) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 3000 >Status: Closed >Resolution: Rejected Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Win32: Fix build when you have TortoiseSVN but no .svn/* Initial Comment: Recent snazzy improvements to the Win32 build system include embedding SVN version information in the builds. This is done by compiling a short C file, make_buildinfo.c, and running the result. make_buildinfo.exe runs the liltingly-named SubWCRev.exe--a tool that comes with TortoiseSVN--over one of the source files, ../Modules/getbuildinfo.c, producing a second file, getbuildinfo2.c. The code is reasonably smart; if you don't have TortoiseSVN, it doesn't bother trying, and just compiles ../Modules/getbuildinfo.c unmodified. However: it blindly assumes that if SubWCRev.exe exists, and the system() call to run it returns 0 or greater, getbuildinfo2.c must have been successfully created. If you have TortoiseSVN, but *don't* have the .svn/... directories in your source tree, system(SubWCRev.exe) returns 0 or greater (seemingly indicating success) but in fact fails and does *not* create getbuildinfo2.c. When it fails in this way I see this inscrutable message in the log: "C:\b\tortoisesvn\bin\subwcrev.exe" .. ..\Modules\getbuildinfo.c getbuildinfo2.c SubWCRev : Path '..' ends in '..', which is unsupported for this operation This patch changes make_buildinfo.c so that it calls _stat(getbuildinfo2.c) as a final step. If getbuildinfo2.c exists, it returns true, else it returns false. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-06 14:50 Message: Logged In: YES user_id=21627 Originator: NO Ok, rejecting the patch then. The typical case would be that you have an exported tree, in which case there wouldn't be any .svn directories at all. As I'm sure you know, you can easily fix your installation by removing the .svn directory from PCBuild also. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-05 02:50 Message: Logged In: YES user_id=364875 Originator: YES Good point. I seem to have goofed up my directory in a very specific way: when I made a copy of the tree, I explicitly did *not* copy the top-level .svn, but I forgot to do anything about the .svn directories in the subdirectories. make_buildinfo is run from the "PCbuild" directory, which still has a ".svn" directory, so the _stat(".svn") call succeeds. But the call to SubWCRev.exe fails because ".." (aka the Python root) doesn't have a ".svn" directory. I assert that the patch won't hurt anything, and will make the build process slightly more tolerant of goof-ups like me. If you prefer, I could submit an alternate patch where the current directory is the Python root and it writes to "PCbuild/getbuildinfo2.c". Or one where the stat checks for "../.svn" instead. Or if you don't want any patch at all, that works too, just close the patch. In the meantime, I'll clean up my build tree. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-05 02:10 Message: Logged In: YES user_id=21627 Originator: NO This patch shouldn't be necessary. make_buildinfo2 checks whether there is a .svn subdirectory, and if there is none, it compiles getbuildinfo.c (just like when subwcrev.exe wasn't found). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628061&group_id=5470 From noreply at sourceforge.net Sat Jan 6 15:24:06 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 06:24:06 -0800 Subject: [Patches] [ python-Patches-1624059 ] fast subclasses of builtin types Message-ID: Patches item #1624059, was opened at 2006-12-29 07:01 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Neal Norwitz (nnorwitz) Assigned to: Guido van Rossum (gvanrossum) Summary: fast subclasses of builtin types Initial Comment: This is similar to a patch posted on python-dev a few months ago (or more). I modified it to also handle subclassing exceptions which should speed up exception handling a bit. (This was proposed by Guido based on the original patch.) I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-06 15:24 Message: Logged In: YES user_id=21627 Originator: NO I made a couple of assembler experiments (see attached a.c), with gcc 4.1 on x86. A "bit mask enumeration" test (f) compiles into four instructions: movl 8(%eax), %eax andl $-268435456, %eax cmpl $1879048192, %eax je .L18 (fall-through being the else case) A single bit test of a flag (g) compiles to two instructions: testl $-1073741824, 8(%eax) je .L9 (fall-through being the if case) Adding an identity test (comparison with the address of a global), followed by a bit mask test (h), compiles into six instructions: cmpl $int_type, %eax je .L2 movl 8(%eax), %eax andl $-268435456, %eax cmpl $1879048192, %eax je .L2 (fall-through being the else case) In the common case, only two of these instructions are executed. So all-in-all, I would agree with Guido that adding bit flags is more efficient. However, existing bits cannot be recycled: in existing binary extension modules, these flags are set, so if the modules don't get recompiled, the type check would believe that the types are subtypes. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:59 Message: Logged In: YES user_id=6380 Originator: NO This looks fine, but I have some questions about alternative implementations: - Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact() before calling the fast bit checking macro? Did you measure that this is in fact faster? True, it means always a pointer deref, so maybe it is -- but OTOH it is more instructions. - Why not have a separate bit for each type? Then you could make the fast macro test for (flags & mask) != 0 instead of testing for (flag & mask) == value. It would use up all the remaining bits, but I suspect there are some unused (or reusable) bits in lower positions: 1L<<2 is unused (was GC), and 1L<<11 also seems unused. And bits 18 through 23! And I'm guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they were introduced in 2.0... So it really looks like you have plenty of bits. Of course I don't know if it matters; would be worth it perhaps to look at the machine code. - Oops, it looks like your comment is off. You claim to be using bits 24-27, leaving 28-31 free, but in fact you're using bits 28-31! BTW You're inroducing quite a few lines over 80 chars. Perhaps cut back a bit? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-29 07:04 Message: Logged In: YES user_id=33168 Originator: YES I forgot to mention this patch works by using unused bits in tp_flags. This saves a function call when checking for a subclass of a builtin type. There's one funky thing about this patch, the change to Objects/exceptions.c. I didn't investigate why this was necessary, or more likely I did why when I added it and forgot. I know that without adding BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470 From noreply at sourceforge.net Sat Jan 6 15:54:28 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 06:54:28 -0800 Subject: [Patches] [ python-Patches-1624059 ] fast subclasses of builtin types Message-ID: Patches item #1624059, was opened at 2006-12-29 07:01 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Neal Norwitz (nnorwitz) Assigned to: Guido van Rossum (gvanrossum) Summary: fast subclasses of builtin types Initial Comment: This is similar to a patch posted on python-dev a few months ago (or more). I modified it to also handle subclassing exceptions which should speed up exception handling a bit. (This was proposed by Guido based on the original patch.) I also dropped an extra bit that was going to indicate if it was a builtin type or a subclass of a builtin type. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-06 15:54 Message: Logged In: YES user_id=21627 Originator: NO File Added: a.c ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-06 15:24 Message: Logged In: YES user_id=21627 Originator: NO I made a couple of assembler experiments (see attached a.c), with gcc 4.1 on x86. A "bit mask enumeration" test (f) compiles into four instructions: movl 8(%eax), %eax andl $-268435456, %eax cmpl $1879048192, %eax je .L18 (fall-through being the else case) A single bit test of a flag (g) compiles to two instructions: testl $-1073741824, 8(%eax) je .L9 (fall-through being the if case) Adding an identity test (comparison with the address of a global), followed by a bit mask test (h), compiles into six instructions: cmpl $int_type, %eax je .L2 movl 8(%eax), %eax andl $-268435456, %eax cmpl $1879048192, %eax je .L2 (fall-through being the else case) In the common case, only two of these instructions are executed. So all-in-all, I would agree with Guido that adding bit flags is more efficient. However, existing bits cannot be recycled: in existing binary extension modules, these flags are set, so if the modules don't get recompiled, the type check would believe that the types are subtypes. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:59 Message: Logged In: YES user_id=6380 Originator: NO This looks fine, but I have some questions about alternative implementations: - Why does the typical PyFoo_Check() macro first call PyFoo_CheckExact() before calling the fast bit checking macro? Did you measure that this is in fact faster? True, it means always a pointer deref, so maybe it is -- but OTOH it is more instructions. - Why not have a separate bit for each type? Then you could make the fast macro test for (flags & mask) != 0 instead of testing for (flag & mask) == value. It would use up all the remaining bits, but I suspect there are some unused (or reusable) bits in lower positions: 1L<<2 is unused (was GC), and 1L<<11 also seems unused. And bits 18 through 23! And I'm guessing that INPLACEOPS (1L<<3) isn't all that interesting any more they were introduced in 2.0... So it really looks like you have plenty of bits. Of course I don't know if it matters; would be worth it perhaps to look at the machine code. - Oops, it looks like your comment is off. You claim to be using bits 24-27, leaving 28-31 free, but in fact you're using bits 28-31! BTW You're inroducing quite a few lines over 80 chars. Perhaps cut back a bit? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-29 07:04 Message: Logged In: YES user_id=33168 Originator: YES I forgot to mention this patch works by using unused bits in tp_flags. This saves a function call when checking for a subclass of a builtin type. There's one funky thing about this patch, the change to Objects/exceptions.c. I didn't investigate why this was necessary, or more likely I did why when I added it and forgot. I know that without adding BASE_EXC_SUBCLASS to tp_flags, test_exceptions fails. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1624059&group_id=5470 From noreply at sourceforge.net Sat Jan 6 21:05:22 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 12:05:22 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 20:53 Message generated for change (Comment added) made by tonylownds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Tony Lownds (tonylownds) Date: 2007-01-06 20:05 Message: Logged In: YES user_id=24100 Originator: YES Change peepholer to not bail in the presence of EXTENDED_ARG + MAKE_FUNCTION. Enforce the natural 16-bit limit of annotations in compile.c. File Added: peepholer_and_max_annotations.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 17:53 Message: Logged In: YES user_id=6380 Originator: NO I like the following approach: (1) the old API continues to work for all functions, but provides incomplete information (not losing the kw-only args completely, but losing the fact that they are kw-only); (2) add a new API that provides all the relevant information. Maybe the new API should not return a 7-tuple but rather a structure with named attributes; that makes it more future-proof. Sorry, I don't have any good suggestions for new names. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 07:12 Message: Logged In: YES user_id=24100 Originator: YES For getargs and getargvalues, including the names in positional args is an excellent strategy. There are uses (in cgitb) in the stdlib for getargvalues that then wouldn't need to be changed. The 2 uses of getargspec in the stdlib (one of which I missed, in DocXMLRPCServer) are both closely followed by formatargspec. I think those APIs should change or information will be lost. Alternatively, a new function (hopefully with a better name than getfullargspec :) could be made and getargspec could retain its API, but raise an error when keyword-only arguments are present. def getargspec(func): args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann = getfullargspec() if kwonlyargs: raise ValueError, "function has keyword-only arguments, use getfullargspec!" return args, varargs, varkw, defaults I'll update the patch to fix getargvalues and DocXMLRPCServer this weekend. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 05:22 Message: Logged In: YES user_id=6380 Originator: NO Well, it depends on the context whether that matters. The kw-only args could just be included in the positional args (which have names anyway) and that wouldn't be so bad for some apps. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 05:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 06:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 03:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 01:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 01:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 00:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 19:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 15:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 21:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 20:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 20:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 22:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 18:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 09:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 08:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-20 01:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 17:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-04 01:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Sat Jan 6 22:03:32 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 13:03:32 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 20:53 Message generated for change (Comment added) made by tonylownds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Tony Lownds (tonylownds) Date: 2007-01-06 21:03 Message: Logged In: YES user_id=24100 Originator: YES I tried to implement getargspec() as described, and unfortunately there is another wrinkle to consider. Keyword-only arguments may or may not have defaults. So the invariant described in getargspec()'s docstring can't be maintained when simply appending keyword-only arguments. A tuple of four things is returned: (args, varargs, varkw, defaults). 'args' is a list of the argument names (it may contain nested lists). 'args' will include keyword-only argument names. 'varargs' and 'varkw' are the names of the * and ** arguments or None. 'defaults' is an n-tuple of the default values of the last n arguments. The attached patch adds an 'getfullargspec' API that returns complete information; 'getargspec' raises an error if information would be lost; the order of arguments in 'formatargspec' is backwards compatible, so that formatargspec(*getargspec(f)) == formatargspec(*getfullargspec(f)) when getargspec(f) does not raise an error. PEP 362 could and probably should replace the new getfullargspec() function, so I did not implement an API more complicated than a tuple. File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-06 20:05 Message: Logged In: YES user_id=24100 Originator: YES Change peepholer to not bail in the presence of EXTENDED_ARG + MAKE_FUNCTION. Enforce the natural 16-bit limit of annotations in compile.c. File Added: peepholer_and_max_annotations.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 17:53 Message: Logged In: YES user_id=6380 Originator: NO I like the following approach: (1) the old API continues to work for all functions, but provides incomplete information (not losing the kw-only args completely, but losing the fact that they are kw-only); (2) add a new API that provides all the relevant information. Maybe the new API should not return a 7-tuple but rather a structure with named attributes; that makes it more future-proof. Sorry, I don't have any good suggestions for new names. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 07:12 Message: Logged In: YES user_id=24100 Originator: YES For getargs and getargvalues, including the names in positional args is an excellent strategy. There are uses (in cgitb) in the stdlib for getargvalues that then wouldn't need to be changed. The 2 uses of getargspec in the stdlib (one of which I missed, in DocXMLRPCServer) are both closely followed by formatargspec. I think those APIs should change or information will be lost. Alternatively, a new function (hopefully with a better name than getfullargspec :) could be made and getargspec could retain its API, but raise an error when keyword-only arguments are present. def getargspec(func): args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann = getfullargspec() if kwonlyargs: raise ValueError, "function has keyword-only arguments, use getfullargspec!" return args, varargs, varkw, defaults I'll update the patch to fix getargvalues and DocXMLRPCServer this weekend. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 05:22 Message: Logged In: YES user_id=6380 Originator: NO Well, it depends on the context whether that matters. The kw-only args could just be included in the positional args (which have names anyway) and that wouldn't be so bad for some apps. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 05:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 06:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 03:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 01:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 01:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 00:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 19:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 15:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 21:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 20:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 20:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 22:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 18:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 09:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 08:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-20 01:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 17:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-04 01:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Sun Jan 7 02:50:18 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 17:50:18 -0800 Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for MINGW Message-ID: Patches item #1597850, was opened at 2006-11-16 16:57 Message generated for change (Comment added) made by rmt38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Han-Wen Nienhuys (hanwen) Assigned to: Nobody/Anonymous (nobody) Summary: Cross compiling patches for MINGW Initial Comment: Hello, attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python. The patches were originally written by Jan Nieuwenhuizen, my codeveloper. These patches have been tested with Linux/x86, linux/x64 and macos 10.3 as build host and linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at lilypond.org/install/ except for darwin contain the x-compiled python. Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme. It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion. Basically, what we do is override autoconf and Makefile settings through setting enviroment variables. **README section** Cross Compiling --------------- Python can be cross compiled by supplying different --build and --host parameters to configure. Python is compiled on the "build" system and executed on the "host" system. Cross compiling python requires a native Python on the build host, and a natively compiled tool `Pgen'. Before cross compiling, Python must first be compiled and installed on the build host. The configure script will use `cc' and `python', or environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg: CC_FOR_BUILD=gcc-3.3 \ PYTHON_FOR_BUILD=python2.4 \ .../configure --build=i686-linux --host=i586-mingw32 Cross compiling has been tested under linux, mileage may vary for other platforms. A few reminders on using configure to cross compile: - Cross compile tools must be in PATH, - Cross compile tools must be prefixed with the host type (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...), - CC, CXX, AR, and RANLIB must be undefined when running configure, they will be auto-detected. If you need a cross compiler, Debian ships several several (eg: avr, m68hc1x, mingw32), while dpkg-cross easily creates others. Otherwise, check out Dan Kegel's crosstool: http://www.kegel.com/crosstool . ---------------------------------------------------------------------- Comment By: Richard Tew (rmt38) Date: 2007-01-07 01:50 Message: Logged In: YES user_id=1417949 Originator: NO This: AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have /dev/ptmx.])) Is being translated into: echo "$as_me:$LINENO: checking for /dev/ptmx" >&5 echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6 if test "${ac_cv_file__dev_ptmx+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else test "$cross_compiling" = yes && { { echo "$as_me:$LINENO: error: cannot check for file existence when cross compiling" >&5 echo "$as_me: error: cannot check for file existence when cross compiling" >&2;} { (exit 1); exit 1; }; } if test -r "/dev/ptmx"; then ac_cv_file__dev_ptmx=yes else ac_cv_file__dev_ptmx=no fi fi Which exits when I do: $ export CC_FOR_BUILD=gcc $ sh configure --host=arm-eabi With an error like: checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling I am using the latest version of msys/mingw with devkitarm to cross compile. Is this supposed to happen? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:50 Message: Logged In: YES user_id=161998 Originator: YES this is a patch against a SVN checkout of last week. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:48 Message: Logged In: YES user_id=161998 Originator: YES With cross.patch I've been able to build a working freebsd python on linux. Since you had little problems with the X-compile patches, I'm resubmitting those first. I'd like to give our (admittedly: oddball) mingw version another go when the X-compile patches are in python SVN. Regarding your comments: * what would be a better to import the SO setting? the most reliable way to get something out of a makefile into python is VAR=foo export VAR .. os.environ['VAR'] this doesn't introduce any fragility in parsing/expanding/(un)quoting, so it's actually pretty good. Right now, I'm overriding sysconfig wholesale in setup.py with a sysconfig._config_vars.update (os.environ) but I'm not sure that this affects the settings in build_ext.py. A freebsd -> linux compile does not touch that code, so if you dislike it, we can leave it out. * I've documented the .x extension File Added: cross.patch ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:12 Message: Logged In: YES user_id=21627 Originator: NO One more note: it would be best if the patches were against the subversion trunk. They won't be included in the 2.5 maintenance branch (as they are a new feature), so they need to be ported to the trunk, anyway. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:06 Message: Logged In: YES user_id=21627 Originator: NO I'll add my comments as I go through the patches. cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc: - why is there now a mingw32msvc2 platform? If the target is mingw (rather than Cygwin), I'd expect that the target is just Win32/Windows, and that all symbolic constants provided be usable across all Win32 Pythons. - why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a target header file? - please include any plat-* files that you generate in the patch. - why do you need dl_nt.c in Modules? Please make it use the one from PC (consider updating the comment about calling initall) b52dbbbbc3adece61496b161d8c22599caae2311 - please combine all patches adding support for __MINGW32__ into a single one. Why is anything needed here at all? I thought Python compiles already with mingw32 (on Windows)? - what is the exclusion of freezing for? 059af829d362b10bb5921367c93a56dbb51ef31b - Why are you taking timeval from winsock2.h? It should come from sys/time.h, and does in my copy of Debian mingw32-runtime. 6a742fb15b28564f9a1bc916c76a28dc672a9b2c - Why are these changes needed? It's Windows, and that is already supported. a838b4780998ef98ae4880c3916274d45b661c82 - Why doesn't that already work on Windows+cygwin+mingw32? f452fe4b95085d8c1ba838bf302a6a48df3c1d31 - I think this should target msvcr71.dll, not msvcrt.dll Please also combine the cross-compilation patches into a single one. - there is no need to provide pyconfig.h.in changes; I'll regenerate that, anyway. 9c022e407c366c9f175e9168542ccc76eae9e3f0 - please integrate those into the large AC_CHECK_FUNCS that already exists 540684d696df6057ee2c9c4e13e33fe450605ffa - Why are you stripping -Wl? 64f5018e975419b2d37c39f457c8732def3288df - Try getting SO from the Makefile, not from the environment (I assume this is also meant to support true distutils packages some day). 7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c - again: what is the "mingw" platform? 7d3a45788a0d83608d10e5c0a34f08b426d62e92 - is this really necessary? I suggest to drop it 23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea - don't include bits/time.h; it's not meant for direct inclusion 6689ca9dea07afbe8a77b7787a5c4e1642f803a1 - what's a .x file? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-25 15:12 Message: Logged In: YES user_id=161998 Originator: YES I've sent the agreement by snailmail. ---------------------------------------------------------------------- Comment By: Jan Nieuwenhuizen (janneke-sf) Date: 2006-11-17 19:57 Message: Logged In: YES user_id=1368960 Originator: NO I do not mind either. I've just signed and faxed contrib-form.html. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:33 Message: Logged In: YES user_id=161998 Originator: YES note that not all of the patch needs to go in its current form. In particular, setup.py should be much more clever to look into build-root for finding libs and include files. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:32 Message: Logged In: YES user_id=161998 Originator: YES I don't mind, and I expect Jan won't have a problem either. What's the procedure: do we send the disclaimer first, or do you do the review, or does everything happen in parallel? ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-11-16 21:47 Message: Logged In: YES user_id=21627 Originator: NO Would you and Jan Nieuwenhuizen be willing to sign the contributor agreement, at http://www.python.org/psf/contrib.html I haven't reviewed the patch yet; if they can be integrated, that will only happen in the trunk (i.e. not for 2.5.x). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 From noreply at sourceforge.net Sun Jan 7 03:37:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 18:37:11 -0800 Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for MINGW Message-ID: Patches item #1597850, was opened at 2006-11-16 16:57 Message generated for change (Comment added) made by hanwen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Han-Wen Nienhuys (hanwen) Assigned to: Nobody/Anonymous (nobody) Summary: Cross compiling patches for MINGW Initial Comment: Hello, attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python. The patches were originally written by Jan Nieuwenhuizen, my codeveloper. These patches have been tested with Linux/x86, linux/x64 and macos 10.3 as build host and linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at lilypond.org/install/ except for darwin contain the x-compiled python. Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme. It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion. Basically, what we do is override autoconf and Makefile settings through setting enviroment variables. **README section** Cross Compiling --------------- Python can be cross compiled by supplying different --build and --host parameters to configure. Python is compiled on the "build" system and executed on the "host" system. Cross compiling python requires a native Python on the build host, and a natively compiled tool `Pgen'. Before cross compiling, Python must first be compiled and installed on the build host. The configure script will use `cc' and `python', or environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg: CC_FOR_BUILD=gcc-3.3 \ PYTHON_FOR_BUILD=python2.4 \ .../configure --build=i686-linux --host=i586-mingw32 Cross compiling has been tested under linux, mileage may vary for other platforms. A few reminders on using configure to cross compile: - Cross compile tools must be in PATH, - Cross compile tools must be prefixed with the host type (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...), - CC, CXX, AR, and RANLIB must be undefined when running configure, they will be auto-detected. If you need a cross compiler, Debian ships several several (eg: avr, m68hc1x, mingw32), while dpkg-cross easily creates others. Otherwise, check out Dan Kegel's crosstool: http://www.kegel.com/crosstool . ---------------------------------------------------------------------- >Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-07 02:37 Message: Logged In: YES user_id=161998 Originator: YES "checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling" You need to set up a config.cache file that contains the correct entry for ac_cv_file__dev_ptmx ---------------------------------------------------------------------- Comment By: Richard Tew (rmt38) Date: 2007-01-07 01:50 Message: Logged In: YES user_id=1417949 Originator: NO This: AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have /dev/ptmx.])) Is being translated into: echo "$as_me:$LINENO: checking for /dev/ptmx" >&5 echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6 if test "${ac_cv_file__dev_ptmx+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else test "$cross_compiling" = yes && { { echo "$as_me:$LINENO: error: cannot check for file existence when cross compiling" >&5 echo "$as_me: error: cannot check for file existence when cross compiling" >&2;} { (exit 1); exit 1; }; } if test -r "/dev/ptmx"; then ac_cv_file__dev_ptmx=yes else ac_cv_file__dev_ptmx=no fi fi Which exits when I do: $ export CC_FOR_BUILD=gcc $ sh configure --host=arm-eabi With an error like: checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling I am using the latest version of msys/mingw with devkitarm to cross compile. Is this supposed to happen? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:50 Message: Logged In: YES user_id=161998 Originator: YES this is a patch against a SVN checkout of last week. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:48 Message: Logged In: YES user_id=161998 Originator: YES With cross.patch I've been able to build a working freebsd python on linux. Since you had little problems with the X-compile patches, I'm resubmitting those first. I'd like to give our (admittedly: oddball) mingw version another go when the X-compile patches are in python SVN. Regarding your comments: * what would be a better to import the SO setting? the most reliable way to get something out of a makefile into python is VAR=foo export VAR .. os.environ['VAR'] this doesn't introduce any fragility in parsing/expanding/(un)quoting, so it's actually pretty good. Right now, I'm overriding sysconfig wholesale in setup.py with a sysconfig._config_vars.update (os.environ) but I'm not sure that this affects the settings in build_ext.py. A freebsd -> linux compile does not touch that code, so if you dislike it, we can leave it out. * I've documented the .x extension File Added: cross.patch ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:12 Message: Logged In: YES user_id=21627 Originator: NO One more note: it would be best if the patches were against the subversion trunk. They won't be included in the 2.5 maintenance branch (as they are a new feature), so they need to be ported to the trunk, anyway. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:06 Message: Logged In: YES user_id=21627 Originator: NO I'll add my comments as I go through the patches. cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc: - why is there now a mingw32msvc2 platform? If the target is mingw (rather than Cygwin), I'd expect that the target is just Win32/Windows, and that all symbolic constants provided be usable across all Win32 Pythons. - why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a target header file? - please include any plat-* files that you generate in the patch. - why do you need dl_nt.c in Modules? Please make it use the one from PC (consider updating the comment about calling initall) b52dbbbbc3adece61496b161d8c22599caae2311 - please combine all patches adding support for __MINGW32__ into a single one. Why is anything needed here at all? I thought Python compiles already with mingw32 (on Windows)? - what is the exclusion of freezing for? 059af829d362b10bb5921367c93a56dbb51ef31b - Why are you taking timeval from winsock2.h? It should come from sys/time.h, and does in my copy of Debian mingw32-runtime. 6a742fb15b28564f9a1bc916c76a28dc672a9b2c - Why are these changes needed? It's Windows, and that is already supported. a838b4780998ef98ae4880c3916274d45b661c82 - Why doesn't that already work on Windows+cygwin+mingw32? f452fe4b95085d8c1ba838bf302a6a48df3c1d31 - I think this should target msvcr71.dll, not msvcrt.dll Please also combine the cross-compilation patches into a single one. - there is no need to provide pyconfig.h.in changes; I'll regenerate that, anyway. 9c022e407c366c9f175e9168542ccc76eae9e3f0 - please integrate those into the large AC_CHECK_FUNCS that already exists 540684d696df6057ee2c9c4e13e33fe450605ffa - Why are you stripping -Wl? 64f5018e975419b2d37c39f457c8732def3288df - Try getting SO from the Makefile, not from the environment (I assume this is also meant to support true distutils packages some day). 7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c - again: what is the "mingw" platform? 7d3a45788a0d83608d10e5c0a34f08b426d62e92 - is this really necessary? I suggest to drop it 23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea - don't include bits/time.h; it's not meant for direct inclusion 6689ca9dea07afbe8a77b7787a5c4e1642f803a1 - what's a .x file? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-25 15:12 Message: Logged In: YES user_id=161998 Originator: YES I've sent the agreement by snailmail. ---------------------------------------------------------------------- Comment By: Jan Nieuwenhuizen (janneke-sf) Date: 2006-11-17 19:57 Message: Logged In: YES user_id=1368960 Originator: NO I do not mind either. I've just signed and faxed contrib-form.html. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:33 Message: Logged In: YES user_id=161998 Originator: YES note that not all of the patch needs to go in its current form. In particular, setup.py should be much more clever to look into build-root for finding libs and include files. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:32 Message: Logged In: YES user_id=161998 Originator: YES I don't mind, and I expect Jan won't have a problem either. What's the procedure: do we send the disclaimer first, or do you do the review, or does everything happen in parallel? ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-11-16 21:47 Message: Logged In: YES user_id=21627 Originator: NO Would you and Jan Nieuwenhuizen be willing to sign the contributor agreement, at http://www.python.org/psf/contrib.html I haven't reviewed the patch yet; if they can be integrated, that will only happen in the trunk (i.e. not for 2.5.x). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 From noreply at sourceforge.net Sun Jan 7 03:37:51 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 18:37:51 -0800 Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for MINGW Message-ID: Patches item #1597850, was opened at 2006-11-16 16:57 Message generated for change (Comment added) made by hanwen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Han-Wen Nienhuys (hanwen) Assigned to: Nobody/Anonymous (nobody) Summary: Cross compiling patches for MINGW Initial Comment: Hello, attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python. The patches were originally written by Jan Nieuwenhuizen, my codeveloper. These patches have been tested with Linux/x86, linux/x64 and macos 10.3 as build host and linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at lilypond.org/install/ except for darwin contain the x-compiled python. Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme. It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion. Basically, what we do is override autoconf and Makefile settings through setting enviroment variables. **README section** Cross Compiling --------------- Python can be cross compiled by supplying different --build and --host parameters to configure. Python is compiled on the "build" system and executed on the "host" system. Cross compiling python requires a native Python on the build host, and a natively compiled tool `Pgen'. Before cross compiling, Python must first be compiled and installed on the build host. The configure script will use `cc' and `python', or environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg: CC_FOR_BUILD=gcc-3.3 \ PYTHON_FOR_BUILD=python2.4 \ .../configure --build=i686-linux --host=i586-mingw32 Cross compiling has been tested under linux, mileage may vary for other platforms. A few reminders on using configure to cross compile: - Cross compile tools must be in PATH, - Cross compile tools must be prefixed with the host type (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...), - CC, CXX, AR, and RANLIB must be undefined when running configure, they will be auto-detected. If you need a cross compiler, Debian ships several several (eg: avr, m68hc1x, mingw32), while dpkg-cross easily creates others. Otherwise, check out Dan Kegel's crosstool: http://www.kegel.com/crosstool . ---------------------------------------------------------------------- >Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-07 02:37 Message: Logged In: YES user_id=161998 Originator: YES "checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling" You need to set up a config.cache file that contains the correct entry for ac_cv_file__dev_ptmx ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-07 02:37 Message: Logged In: YES user_id=161998 Originator: YES "checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling" You need to set up a config.cache file that contains the correct entry for ac_cv_file__dev_ptmx ---------------------------------------------------------------------- Comment By: Richard Tew (rmt38) Date: 2007-01-07 01:50 Message: Logged In: YES user_id=1417949 Originator: NO This: AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have /dev/ptmx.])) Is being translated into: echo "$as_me:$LINENO: checking for /dev/ptmx" >&5 echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6 if test "${ac_cv_file__dev_ptmx+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else test "$cross_compiling" = yes && { { echo "$as_me:$LINENO: error: cannot check for file existence when cross compiling" >&5 echo "$as_me: error: cannot check for file existence when cross compiling" >&2;} { (exit 1); exit 1; }; } if test -r "/dev/ptmx"; then ac_cv_file__dev_ptmx=yes else ac_cv_file__dev_ptmx=no fi fi Which exits when I do: $ export CC_FOR_BUILD=gcc $ sh configure --host=arm-eabi With an error like: checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling I am using the latest version of msys/mingw with devkitarm to cross compile. Is this supposed to happen? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:50 Message: Logged In: YES user_id=161998 Originator: YES this is a patch against a SVN checkout of last week. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:48 Message: Logged In: YES user_id=161998 Originator: YES With cross.patch I've been able to build a working freebsd python on linux. Since you had little problems with the X-compile patches, I'm resubmitting those first. I'd like to give our (admittedly: oddball) mingw version another go when the X-compile patches are in python SVN. Regarding your comments: * what would be a better to import the SO setting? the most reliable way to get something out of a makefile into python is VAR=foo export VAR .. os.environ['VAR'] this doesn't introduce any fragility in parsing/expanding/(un)quoting, so it's actually pretty good. Right now, I'm overriding sysconfig wholesale in setup.py with a sysconfig._config_vars.update (os.environ) but I'm not sure that this affects the settings in build_ext.py. A freebsd -> linux compile does not touch that code, so if you dislike it, we can leave it out. * I've documented the .x extension File Added: cross.patch ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:12 Message: Logged In: YES user_id=21627 Originator: NO One more note: it would be best if the patches were against the subversion trunk. They won't be included in the 2.5 maintenance branch (as they are a new feature), so they need to be ported to the trunk, anyway. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:06 Message: Logged In: YES user_id=21627 Originator: NO I'll add my comments as I go through the patches. cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc: - why is there now a mingw32msvc2 platform? If the target is mingw (rather than Cygwin), I'd expect that the target is just Win32/Windows, and that all symbolic constants provided be usable across all Win32 Pythons. - why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a target header file? - please include any plat-* files that you generate in the patch. - why do you need dl_nt.c in Modules? Please make it use the one from PC (consider updating the comment about calling initall) b52dbbbbc3adece61496b161d8c22599caae2311 - please combine all patches adding support for __MINGW32__ into a single one. Why is anything needed here at all? I thought Python compiles already with mingw32 (on Windows)? - what is the exclusion of freezing for? 059af829d362b10bb5921367c93a56dbb51ef31b - Why are you taking timeval from winsock2.h? It should come from sys/time.h, and does in my copy of Debian mingw32-runtime. 6a742fb15b28564f9a1bc916c76a28dc672a9b2c - Why are these changes needed? It's Windows, and that is already supported. a838b4780998ef98ae4880c3916274d45b661c82 - Why doesn't that already work on Windows+cygwin+mingw32? f452fe4b95085d8c1ba838bf302a6a48df3c1d31 - I think this should target msvcr71.dll, not msvcrt.dll Please also combine the cross-compilation patches into a single one. - there is no need to provide pyconfig.h.in changes; I'll regenerate that, anyway. 9c022e407c366c9f175e9168542ccc76eae9e3f0 - please integrate those into the large AC_CHECK_FUNCS that already exists 540684d696df6057ee2c9c4e13e33fe450605ffa - Why are you stripping -Wl? 64f5018e975419b2d37c39f457c8732def3288df - Try getting SO from the Makefile, not from the environment (I assume this is also meant to support true distutils packages some day). 7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c - again: what is the "mingw" platform? 7d3a45788a0d83608d10e5c0a34f08b426d62e92 - is this really necessary? I suggest to drop it 23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea - don't include bits/time.h; it's not meant for direct inclusion 6689ca9dea07afbe8a77b7787a5c4e1642f803a1 - what's a .x file? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-25 15:12 Message: Logged In: YES user_id=161998 Originator: YES I've sent the agreement by snailmail. ---------------------------------------------------------------------- Comment By: Jan Nieuwenhuizen (janneke-sf) Date: 2006-11-17 19:57 Message: Logged In: YES user_id=1368960 Originator: NO I do not mind either. I've just signed and faxed contrib-form.html. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:33 Message: Logged In: YES user_id=161998 Originator: YES note that not all of the patch needs to go in its current form. In particular, setup.py should be much more clever to look into build-root for finding libs and include files. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:32 Message: Logged In: YES user_id=161998 Originator: YES I don't mind, and I expect Jan won't have a problem either. What's the procedure: do we send the disclaimer first, or do you do the review, or does everything happen in parallel? ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-11-16 21:47 Message: Logged In: YES user_id=21627 Originator: NO Would you and Jan Nieuwenhuizen be willing to sign the contributor agreement, at http://www.python.org/psf/contrib.html I haven't reviewed the patch yet; if they can be integrated, that will only happen in the trunk (i.e. not for 2.5.x). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 From noreply at sourceforge.net Sun Jan 7 05:42:22 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 20:42:22 -0800 Subject: [Patches] [ python-Patches-909005 ] asyncore fixes and improvements Message-ID: Patches item #909005, was opened at 2004-03-03 05:07 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexey Klimkin (klimkin) Assigned to: A.M. Kuchling (akuchling) Summary: asyncore fixes and improvements Initial Comment: Minor: * 0/1 for boolean values replaced with False/True. * (887279) Added handling of POLLPRI as POLLIN. POLLERR, POLLHUP, POLLNVAL are handled as exception event. handle_expt_event gets recent error from self.socket object and raises socket.error. * Default readable()/writable() returns False. * Added "map" parameter for file_dispatcher. * file_wrapper: removed "return" in close(), recv/read and send/write swapped because of their nature. * mac code for writable() removed. Manual for accept() on mac is similar to the one on linux. * Repeating exception changed from "raise socket.error, why" to raise. * Added connected/accepting/addr reset on close(). Initialization of variables moved to __init__. * close_all() now calls close for dispatcher object, EBADF treated as already closed socket/file. * Added channel id to "unhandled..." messages. Bugs: * Fixed bug (654766,889153): client never gets connected, nor errored. Connecting client gets writable event from select(), however, some client may want always be non writable. Such client may never get connected. The fix adds _readable() - always True for accepting and always False for connecting socket; and _writable() - always False for accepting and always True for connecting socket. This implies, that listening dispatcher's readable() and writable() will never be called. ("man accept" and "man connect" for non-blocking sockets). * Fixed bug: error handling after accept(). It's said, that accept can return EWOULDBLOCK even for readable socket. This mean, that even after handle_accept(), dispatcher's accept() still raise EWOULDBLOCK. New code does accept() itself and stores accepted socket in self.__pending_accept. If there was socket.error, it's treated as EWOULDBLOCK. dispatcher's accept returns self.__pending_accept and resets it to None. Features: * Added pending_read() and pending_write(). The functions helps to use dispatcher over non socket objects with buffering capabilities. In original dispatcher, if socket makes buffered read and some data is in buffer, entering asyncore.poll() doesn't finishes, since there is no data in real file/socket. This feature allow to use SSL socket, since the socket reads data by 16k chunks. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-06 20:42 Message: Logged In: YES user_id=341410 Originator: NO Many of the changes in the source provided by klimkin in his most recent revision from February 27, 2005 seek to solve certain problems in an inconsistent or incorrect way. Some of his changes (or variants thereof) are worthwhile. I'll start with my issues with his asyncore changes, then describe what I think should be added from them. For example, in his updated asyncore.py, the list of sockets is first shuffled randomly, then sorted based on priority. Assuming that one ignored priorities for a moment, if there were more sockets than the max sockets for the platform, then due to the limitations of randomness, there would be no guarantees that all sockets would get polled. Say, for example, that one were using windows and were running close to the actual select file handle limit (512 in Python 2.3) with 500 handles, you would skip 436 of the sockets *this pass*. In 10 passes, there would have been 100 sockets that were never polled. In 20 passes, there would still be, on average, 20 that were never polled. So this "randomization" step is the wrong thing to do, unless you actually make multiple select calls for each poll() call. But really, select is limited by 512, and I've run it with 500 without issue. The priority based sorting has much of the same problems, but it is even worse when you have nontrivial numbers of differing priorities, regardless of randomization or not. The max socket limit of 64 on Windows isn't correct. It's been 512 since at least Python 2.3 . And all other platforms being 65536? No. I've had some versions of linux die on me at 512, others at 4096, but all were dog slow beyond 500 or so. It's better to let the underlying system raise an exception for the user when it fails and let them attempt to tune it, rather than forcing a tuning that may not be correct. The "pending read" stuff is also misdirected. Assuming a non-broken async client or server, either should be handling content as it comes it, dispatching as necessary. See asynchat.collect_incoming_data() and asynchat.found_terminator() for examples. The idispatcher stuff seems unnecessary. Generally speaking, it seems to me that there are 3 levels of abstraction going on: 1) handle_*_event(), called by poll, poll2, etc. 2) handle_*(), called by handle_*_event(), user overrides, calls other handle_*() and *() methods 3) *() (aka recv, send, close, etc.), called by handle_*(), generally left alone. Some of your code breaks the abstraction and has items in layer 2 call items in layer 1, which then call items in layer 2 again. This seems unnecessary, and breaks the general downward calling semantic (except in the case of errors returned by layer 3 resulting in layer 2 handle_close() calls, which is the proper method to call). There are, according to my reading of the asyncore portions of your included module, a few things that may be worthy for inclusion into the Python standard library are: * A variant of your changes to close_all(), though it should proceed in closing everything unless a KeyboardInterrupt, SystemExit, or ExitNow exception is raised. Socket errors should be ignored, because we are closing them - we don't care about their error condition. * Checking sockets for socket error via socket.getsockopt() . * A variant of your .close() implementation. * The CONNRESET, etc., stuff in the send() and recv() methods, but not the handle_close_event() replacements, stick with handle_close() . * Checking for KeyboardInterrupt and SystemExit inside the poll functions. * The _closed_socket class and initialization. All but the last of the above, I would consider to be bugfixes, and if others agree that these are reasonable changes, I'll write up a patch against trunk and 2.5 maintenance. The last change, while I think would be nice, probably shouldn't be included in 2.5 maintenance, though I think would be fine for the trunk. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2005-02-26 13:39 Message: Logged In: YES user_id=410460 Minor improvements: * Added handle_close_event(): calls handle_close(), then closes channel. No need to write self.close() in each handle_close (). * Improved exception handling. KeyboardInterrupt is not blocked. For python exception handle_error_event() is called, which checks for KeyboardInterrupt and closes socket, if handle_error didn't. Bugs: * Calling connect() could raise exception and doesn't hit handle_error(). Now if there was an exception, handle_error_event() is called. Features: * set_timeout(): Sets timeout for dispatcher object, if there was no io for the object, raises ETIMEDOUT, which handled by handle_error_event(). * Fixed issue with Windows - too many descriptors in select(). The list of sockets shuffled and only first asyncore.max_channels used in select(). * Added set_prio(): Sets priority for dispatcher. After shuffle the list of sockets sorted by priority. You may also check asynhttplib - asynchronous version of httplib. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-07-02 06:44 Message: Logged In: YES user_id=410460 In addition to "[ 909005 ] asyncore fixes and improvements" and CVS version "asyncore.py,v 2.51" this patch provides: * Added handling of buffered socket layer (pending_read(), pending_write()). * Added fd number for __repr__. * Initialized self.socket = socket._closedsocket() instead of None for verbose error output (like closed socket.socket). * asyncore and asynchat implements idispatcher and iasync_chat. * Fixed self.addr initialization. * Removed import exceptions. * Don't filter KeyboardInterrupt, just pass through. * Added queue of sockets, solves the problem of select() on too many descriptors. I have run make test in python cvs distrib without problems. Examples of using i* included. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-06-05 10:54 Message: Logged In: YES user_id=11375 I've struggled to get the test suite running without errors on my machine, but have failed. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-03-21 22:15 Message: Logged In: YES user_id=410460 There is no real reason for this change, please undo. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:18 Message: Logged In: YES user_id=11375 In your version of file_dispatch.__init__, the .set_file() call is moved earlier; can you say why? ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:13 Message: Logged In: YES user_id=11375 Added "map" parameter for file_dispatcher and dispatcher_with_send in CVS HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:08 Message: Logged In: YES user_id=11375 Repeating exception changes ('raise socket.error' -> just 'raise') checked into HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:02 Message: Logged In: YES user_id=11375 Mac code for writable() removed from HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:02 Message: Logged In: YES user_id=11375 Patch to use True/False applied to HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 11:55 Message: Logged In: YES user_id=11375 Fix for bug #887279 applied to HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 11:48 Message: Logged In: YES user_id=11375 The many number of changes in this patch make it difficult to figure out which changes fix which problem. I've created a new directory in CVS, nondist/sandbox/asyncore, that contains copies of the module with these patches applied, and will work on applying changes to the copy in dist/src. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-03-16 23:15 Message: Logged In: YES user_id=410460 Sorry, unfortunately I have lost old patch file. I have atached new one. In addition to fixes, listed above, the patch includes: 1. Fix for operating on uninitialized socket. self.socket now initializes with _closed_socket(), so any operation throws EBADF. 2. Added class idispatcher - base class for dispatcher. The purpose of this class is to allow simple replacement of media(dispatcher interface) in classes, derived from dispatcher class. This is based on 'object'. I have also attached asynchat.diff - example for new-style dispatcher. Old asynchat works as well. ---------------------------------------------------------------------- Comment By: Wummel (calvin) Date: 2004-03-11 07:49 Message: Logged In: YES user_id=9205 There is no file attached! You have to click on the checkbox next to the upload filename. This is a Sourceforge annoyance :( ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470 From noreply at sourceforge.net Sun Jan 7 05:53:55 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 20:53:55 -0800 Subject: [Patches] [ python-Patches-909005 ] asyncore fixes and improvements Message-ID: Patches item #909005, was opened at 2004-03-03 05:07 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexey Klimkin (klimkin) Assigned to: A.M. Kuchling (akuchling) Summary: asyncore fixes and improvements Initial Comment: Minor: * 0/1 for boolean values replaced with False/True. * (887279) Added handling of POLLPRI as POLLIN. POLLERR, POLLHUP, POLLNVAL are handled as exception event. handle_expt_event gets recent error from self.socket object and raises socket.error. * Default readable()/writable() returns False. * Added "map" parameter for file_dispatcher. * file_wrapper: removed "return" in close(), recv/read and send/write swapped because of their nature. * mac code for writable() removed. Manual for accept() on mac is similar to the one on linux. * Repeating exception changed from "raise socket.error, why" to raise. * Added connected/accepting/addr reset on close(). Initialization of variables moved to __init__. * close_all() now calls close for dispatcher object, EBADF treated as already closed socket/file. * Added channel id to "unhandled..." messages. Bugs: * Fixed bug (654766,889153): client never gets connected, nor errored. Connecting client gets writable event from select(), however, some client may want always be non writable. Such client may never get connected. The fix adds _readable() - always True for accepting and always False for connecting socket; and _writable() - always False for accepting and always True for connecting socket. This implies, that listening dispatcher's readable() and writable() will never be called. ("man accept" and "man connect" for non-blocking sockets). * Fixed bug: error handling after accept(). It's said, that accept can return EWOULDBLOCK even for readable socket. This mean, that even after handle_accept(), dispatcher's accept() still raise EWOULDBLOCK. New code does accept() itself and stores accepted socket in self.__pending_accept. If there was socket.error, it's treated as EWOULDBLOCK. dispatcher's accept returns self.__pending_accept and resets it to None. Features: * Added pending_read() and pending_write(). The functions helps to use dispatcher over non socket objects with buffering capabilities. In original dispatcher, if socket makes buffered read and some data is in buffer, entering asyncore.poll() doesn't finishes, since there is no data in real file/socket. This feature allow to use SSL socket, since the socket reads data by 16k chunks. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-06 20:53 Message: Logged In: YES user_id=341410 Originator: NO In asynchat, the only stuff that should be accepted is the handle_read() changes. The deque removal should be ignored (we have deques since Python 2.4, which are *significantly* faster than lists in nontrivial applications), the iasync_chat stuff, like the idispatcher stuff, seems unnecessary. And that's pretty much it for asynchat. The proposed asynchttp module shouldn't go into the Python standard library until it has lived on its own for a nontrival amount of time in the Cheeseshop and is found to be as good as httplib, urllib, or urllib2. Even then, its inclusion should be questioned, as medusa (the http server based on asyncore) has been around for a decade or more, is used many places, and yet still isn't in the standard library. The asyncoreTest.py needs a bit of work (I notice some incorrect names), but could be used as an addition to the test suite (currently it seems as though only asynchat is tested). ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-06 20:42 Message: Logged In: YES user_id=341410 Originator: NO Many of the changes in the source provided by klimkin in his most recent revision from February 27, 2005 seek to solve certain problems in an inconsistent or incorrect way. Some of his changes (or variants thereof) are worthwhile. I'll start with my issues with his asyncore changes, then describe what I think should be added from them. For example, in his updated asyncore.py, the list of sockets is first shuffled randomly, then sorted based on priority. Assuming that one ignored priorities for a moment, if there were more sockets than the max sockets for the platform, then due to the limitations of randomness, there would be no guarantees that all sockets would get polled. Say, for example, that one were using windows and were running close to the actual select file handle limit (512 in Python 2.3) with 500 handles, you would skip 436 of the sockets *this pass*. In 10 passes, there would have been 100 sockets that were never polled. In 20 passes, there would still be, on average, 20 that were never polled. So this "randomization" step is the wrong thing to do, unless you actually make multiple select calls for each poll() call. But really, select is limited by 512, and I've run it with 500 without issue. The priority based sorting has much of the same problems, but it is even worse when you have nontrivial numbers of differing priorities, regardless of randomization or not. The max socket limit of 64 on Windows isn't correct. It's been 512 since at least Python 2.3 . And all other platforms being 65536? No. I've had some versions of linux die on me at 512, others at 4096, but all were dog slow beyond 500 or so. It's better to let the underlying system raise an exception for the user when it fails and let them attempt to tune it, rather than forcing a tuning that may not be correct. The "pending read" stuff is also misdirected. Assuming a non-broken async client or server, either should be handling content as it comes it, dispatching as necessary. See asynchat.collect_incoming_data() and asynchat.found_terminator() for examples. The idispatcher stuff seems unnecessary. Generally speaking, it seems to me that there are 3 levels of abstraction going on: 1) handle_*_event(), called by poll, poll2, etc. 2) handle_*(), called by handle_*_event(), user overrides, calls other handle_*() and *() methods 3) *() (aka recv, send, close, etc.), called by handle_*(), generally left alone. Some of your code breaks the abstraction and has items in layer 2 call items in layer 1, which then call items in layer 2 again. This seems unnecessary, and breaks the general downward calling semantic (except in the case of errors returned by layer 3 resulting in layer 2 handle_close() calls, which is the proper method to call). There are, according to my reading of the asyncore portions of your included module, a few things that may be worthy for inclusion into the Python standard library are: * A variant of your changes to close_all(), though it should proceed in closing everything unless a KeyboardInterrupt, SystemExit, or ExitNow exception is raised. Socket errors should be ignored, because we are closing them - we don't care about their error condition. * Checking sockets for socket error via socket.getsockopt() . * A variant of your .close() implementation. * The CONNRESET, etc., stuff in the send() and recv() methods, but not the handle_close_event() replacements, stick with handle_close() . * Checking for KeyboardInterrupt and SystemExit inside the poll functions. * The _closed_socket class and initialization. All but the last of the above, I would consider to be bugfixes, and if others agree that these are reasonable changes, I'll write up a patch against trunk and 2.5 maintenance. The last change, while I think would be nice, probably shouldn't be included in 2.5 maintenance, though I think would be fine for the trunk. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2005-02-26 13:39 Message: Logged In: YES user_id=410460 Minor improvements: * Added handle_close_event(): calls handle_close(), then closes channel. No need to write self.close() in each handle_close (). * Improved exception handling. KeyboardInterrupt is not blocked. For python exception handle_error_event() is called, which checks for KeyboardInterrupt and closes socket, if handle_error didn't. Bugs: * Calling connect() could raise exception and doesn't hit handle_error(). Now if there was an exception, handle_error_event() is called. Features: * set_timeout(): Sets timeout for dispatcher object, if there was no io for the object, raises ETIMEDOUT, which handled by handle_error_event(). * Fixed issue with Windows - too many descriptors in select(). The list of sockets shuffled and only first asyncore.max_channels used in select(). * Added set_prio(): Sets priority for dispatcher. After shuffle the list of sockets sorted by priority. You may also check asynhttplib - asynchronous version of httplib. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-07-02 06:44 Message: Logged In: YES user_id=410460 In addition to "[ 909005 ] asyncore fixes and improvements" and CVS version "asyncore.py,v 2.51" this patch provides: * Added handling of buffered socket layer (pending_read(), pending_write()). * Added fd number for __repr__. * Initialized self.socket = socket._closedsocket() instead of None for verbose error output (like closed socket.socket). * asyncore and asynchat implements idispatcher and iasync_chat. * Fixed self.addr initialization. * Removed import exceptions. * Don't filter KeyboardInterrupt, just pass through. * Added queue of sockets, solves the problem of select() on too many descriptors. I have run make test in python cvs distrib without problems. Examples of using i* included. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-06-05 10:54 Message: Logged In: YES user_id=11375 I've struggled to get the test suite running without errors on my machine, but have failed. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-03-21 22:15 Message: Logged In: YES user_id=410460 There is no real reason for this change, please undo. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:18 Message: Logged In: YES user_id=11375 In your version of file_dispatch.__init__, the .set_file() call is moved earlier; can you say why? ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:13 Message: Logged In: YES user_id=11375 Added "map" parameter for file_dispatcher and dispatcher_with_send in CVS HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:08 Message: Logged In: YES user_id=11375 Repeating exception changes ('raise socket.error' -> just 'raise') checked into HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:02 Message: Logged In: YES user_id=11375 Mac code for writable() removed from HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 12:02 Message: Logged In: YES user_id=11375 Patch to use True/False applied to HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 11:55 Message: Logged In: YES user_id=11375 Fix for bug #887279 applied to HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 11:48 Message: Logged In: YES user_id=11375 The many number of changes in this patch make it difficult to figure out which changes fix which problem. I've created a new directory in CVS, nondist/sandbox/asyncore, that contains copies of the module with these patches applied, and will work on applying changes to the copy in dist/src. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-03-16 23:15 Message: Logged In: YES user_id=410460 Sorry, unfortunately I have lost old patch file. I have atached new one. In addition to fixes, listed above, the patch includes: 1. Fix for operating on uninitialized socket. self.socket now initializes with _closed_socket(), so any operation throws EBADF. 2. Added class idispatcher - base class for dispatcher. The purpose of this class is to allow simple replacement of media(dispatcher interface) in classes, derived from dispatcher class. This is based on 'object'. I have also attached asynchat.diff - example for new-style dispatcher. Old asynchat works as well. ---------------------------------------------------------------------- Comment By: Wummel (calvin) Date: 2004-03-11 07:49 Message: Logged In: YES user_id=9205 There is no file attached! You have to click on the checkbox next to the upload filename. This is a Sourceforge annoyance :( ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470 From noreply at sourceforge.net Sun Jan 7 06:08:32 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 21:08:32 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 01:37 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-06 21:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sun Jan 7 06:19:55 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 21:19:55 -0800 Subject: [Patches] [ python-Patches-1617702 ] extended slicing for buffer objects Message-ID: Patches item #1617702, was opened at 2006-12-17 20:45 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617702&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Nobody/Anonymous (nobody) Summary: extended slicing for buffer objects Initial Comment: extended slicing support for buffer objects. Including slice assignment, but I don't know of a way to test assignment. (Backported from p3yk-noslice branch.) ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-06 21:19 Message: Logged In: YES user_id=341410 Originator: NO As per current trunk source, read-write buffers can only be created via the CPython API. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617702&group_id=5470 From noreply at sourceforge.net Sun Jan 7 06:51:17 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 06 Jan 2007 21:51:17 -0800 Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining on BINARY_SUBSCR Message-ID: Patches item #1629718, was opened at 2007-01-07 14:51 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Hirokazu Yamamoto (ocean-city) Assigned to: Nobody/Anonymous (nobody) Summary: fast tuple[index] by inlining on BINARY_SUBSCR Initial Comment: Hello. I noticed there is speed difference between a = [0,] # list a[0] # fast and a = (0,) # tuple a[0] # slow while solving ICPC puzzle with Python. I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them. After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR). Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 From noreply at sourceforge.net Sun Jan 7 10:03:20 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 01:03:20 -0800 Subject: [Patches] [ python-Patches-1609282 ] #1603424 subprocess.py wrongly claims 2.2 compatibility. Message-ID: Patches item #1609282, was opened at 2006-12-05 16:16 Message generated for change (Comment added) made by astrand You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1609282&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Rejected Priority: 5 Private: No Submitted By: Robert Carr (racarr) >Assigned to: Peter ?strand (astrand) Summary: #1603424 subprocess.py wrongly claims 2.2 compatibility. Initial Comment: Simple fix restoring 2.2 compatibility in subprocess.py. This makes more sense than a list comprehension or constructing sets in my opinion even ignoring the bug. ---------------------------------------------------------------------- >Comment By: Peter ?strand (astrand) Date: 2007-01-07 10:03 Message: Logged In: YES user_id=344921 Originator: NO This patch is rejected, due to the problem described in gbrandl:s comment. Another fix has been submitted, though, which solves bug #1603424. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-12-08 21:55 Message: Logged In: YES user_id=849994 Originator: NO This patch changes semantics: if two names refer to the same fd, it is attempted to be closed twice. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1609282&group_id=5470 From noreply at sourceforge.net Sun Jan 7 11:58:43 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 02:58:43 -0800 Subject: [Patches] [ python-Patches-1603907 ] subprocess: error redirecting i/o from non-console process Message-ID: Patches item #1603907, was opened at 2006-11-27 18:20 Message generated for change (Comment added) made by astrand You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Oren Tirosh (orenti) Assigned to: Peter ?strand (astrand) Summary: subprocess: error redirecting i/o from non-console process Initial Comment: In IDLE, PythonWin or other non-console interactive Python under Windows: >>> from subprocess import * >>> Popen('cmd', stdout=PIPE) Traceback (most recent call last): File "", line 1, in -toplevel- Popen('', stdout=PIPE) File "C:\python24\lib\subprocess.py", line 533, in __init__ (p2cread, p2cwrite, File "C:\python24\lib\subprocess.py", line 593, in _get_handles p2cread = self._make_inheritable(p2cread) File "C:\python24\lib\subprocess.py", line 634, in _make_inheritable DUPLICATE_SAME_ACCESS) TypeError: an integer is required The same command in a console windows is successful. Why it happens: subprocess assumes that GetStdHandle always succeeds but when there is no console it returns None. DuplicateHandle then complains about getting a non-integer. This problem does not happen when redirecting all three standard handles. Solution: Replace None with -1 (INVALID_HANDLE_VALUE) in _make_inheritable. Patch attached. ---------------------------------------------------------------------- >Comment By: Peter ?strand (astrand) Date: 2007-01-07 11:58 Message: Logged In: YES user_id=344921 Originator: NO This patch looks very interesting. However, it feels a little bit strange to call DuplicateHandle with a handle of -1. Is this really allowed? What will DuplicateHandle return in this case? INVALID_HANDLE_VALUE? In that case, isn't it better to return INVALID_HANDLE_VALUE directly? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470 From noreply at sourceforge.net Sun Jan 7 19:09:42 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 10:09:42 -0800 Subject: [Patches] [ python-Patches-1603907 ] subprocess: error redirecting i/o from non-console process Message-ID: Patches item #1603907, was opened at 2006-11-27 17:20 Message generated for change (Comment added) made by orenti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Oren Tirosh (orenti) Assigned to: Peter ?strand (astrand) Summary: subprocess: error redirecting i/o from non-console process Initial Comment: In IDLE, PythonWin or other non-console interactive Python under Windows: >>> from subprocess import * >>> Popen('cmd', stdout=PIPE) Traceback (most recent call last): File "", line 1, in -toplevel- Popen('', stdout=PIPE) File "C:\python24\lib\subprocess.py", line 533, in __init__ (p2cread, p2cwrite, File "C:\python24\lib\subprocess.py", line 593, in _get_handles p2cread = self._make_inheritable(p2cread) File "C:\python24\lib\subprocess.py", line 634, in _make_inheritable DUPLICATE_SAME_ACCESS) TypeError: an integer is required The same command in a console windows is successful. Why it happens: subprocess assumes that GetStdHandle always succeeds but when there is no console it returns None. DuplicateHandle then complains about getting a non-integer. This problem does not happen when redirecting all three standard handles. Solution: Replace None with -1 (INVALID_HANDLE_VALUE) in _make_inheritable. Patch attached. ---------------------------------------------------------------------- >Comment By: Oren Tirosh (orenti) Date: 2007-01-07 18:09 Message: Logged In: YES user_id=562624 Originator: YES If you duplicate INVALID_HANDLE_VALUE you get a new valid handle to nothing :-) I guess the code really should not rely on this undocumented behavior. The reason I didn't return INVALID_HANDLE_VALUE directly is because DuplicateHandle returns a _subprocess_handle object, not an int. It's expected to have a .Close() method elsewhere in the code. Because of subtle difference between in the behavior of the _subprocess and win32api implementations of GetStdHandle in this case solving this issue this gets quite messy! File Added: subprocess-noconsole2.patch ---------------------------------------------------------------------- Comment By: Peter ?strand (astrand) Date: 2007-01-07 10:58 Message: Logged In: YES user_id=344921 Originator: NO This patch looks very interesting. However, it feels a little bit strange to call DuplicateHandle with a handle of -1. Is this really allowed? What will DuplicateHandle return in this case? INVALID_HANDLE_VALUE? In that case, isn't it better to return INVALID_HANDLE_VALUE directly? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470 From noreply at sourceforge.net Sun Jan 7 19:13:19 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 10:13:19 -0800 Subject: [Patches] [ python-Patches-1603907 ] subprocess: error redirecting i/o from non-console process Message-ID: Patches item #1603907, was opened at 2006-11-27 17:20 Message generated for change (Comment added) made by orenti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Oren Tirosh (orenti) Assigned to: Peter ?strand (astrand) Summary: subprocess: error redirecting i/o from non-console process Initial Comment: In IDLE, PythonWin or other non-console interactive Python under Windows: >>> from subprocess import * >>> Popen('cmd', stdout=PIPE) Traceback (most recent call last): File "", line 1, in -toplevel- Popen('', stdout=PIPE) File "C:\python24\lib\subprocess.py", line 533, in __init__ (p2cread, p2cwrite, File "C:\python24\lib\subprocess.py", line 593, in _get_handles p2cread = self._make_inheritable(p2cread) File "C:\python24\lib\subprocess.py", line 634, in _make_inheritable DUPLICATE_SAME_ACCESS) TypeError: an integer is required The same command in a console windows is successful. Why it happens: subprocess assumes that GetStdHandle always succeeds but when there is no console it returns None. DuplicateHandle then complains about getting a non-integer. This problem does not happen when redirecting all three standard handles. Solution: Replace None with -1 (INVALID_HANDLE_VALUE) in _make_inheritable. Patch attached. ---------------------------------------------------------------------- >Comment By: Oren Tirosh (orenti) Date: 2007-01-07 18:13 Message: Logged In: YES user_id=562624 Originator: YES Oops. The new patch does not solve it in all cases in the win32api version, either... ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2007-01-07 18:09 Message: Logged In: YES user_id=562624 Originator: YES If you duplicate INVALID_HANDLE_VALUE you get a new valid handle to nothing :-) I guess the code really should not rely on this undocumented behavior. The reason I didn't return INVALID_HANDLE_VALUE directly is because DuplicateHandle returns a _subprocess_handle object, not an int. It's expected to have a .Close() method elsewhere in the code. Because of subtle difference between in the behavior of the _subprocess and win32api implementations of GetStdHandle in this case solving this issue this gets quite messy! File Added: subprocess-noconsole2.patch ---------------------------------------------------------------------- Comment By: Peter ?strand (astrand) Date: 2007-01-07 10:58 Message: Logged In: YES user_id=344921 Originator: NO This patch looks very interesting. However, it feels a little bit strange to call DuplicateHandle with a handle of -1. Is this really allowed? What will DuplicateHandle return in this case? INVALID_HANDLE_VALUE? In that case, isn't it better to return INVALID_HANDLE_VALUE directly? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1603907&group_id=5470 From noreply at sourceforge.net Sun Jan 7 19:24:24 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 10:24:24 -0800 Subject: [Patches] [ python-Patches-1628205 ] socket.readline() interface doesn't handle EINTR properly Message-ID: Patches item #1628205, was opened at 2007-01-04 21:37 Message generated for change (Comment added) made by orenti You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Maxim Sobolev (sobomax) Assigned to: Nobody/Anonymous (nobody) Summary: socket.readline() interface doesn't handle EINTR properly Initial Comment: The socket.readline() interface doesn't handle EINTR properly. Currently, when EINTR received exception is not handled and all data that has been in the buffer is lost. There is no way to recover that data from the code that uses the interface. Correct behaviour would be to catch EINTR and restart recv(). Patch is attached. Following is the real world example of how it affects httplib module: File "/usr/local/lib/python2.4/xmlrpclib.py", line 1096, in __call__ return self.__send(self.__name, args) File "/usr/local/lib/python2.4/xmlrpclib.py", line 1383, in __request verbose=self.__verbose File "/usr/local/lib/python2.4/xmlrpclib.py", line 1131, in request errcode, errmsg, headers = h.getreply() File "/usr/local/lib/python2.4/httplib.py", line 1137, in getreply response = self._conn.getresponse() File "/usr/local/lib/python2.4/httplib.py", line 866, in getresponse response.begin() File "/usr/local/lib/python2.4/httplib.py", line 336, in begin version, status, reason = self._read_status() File "/usr/local/lib/python2.4/httplib.py", line 294, in _read_status line = self.fp.readline() File "/usr/local/lib/python2.4/socket.py", line 325, in readline data = recv(1) error: (4, 'Interrupted system call') -Maxim ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2007-01-07 18:24 Message: Logged In: YES user_id=562624 Originator: NO You may have encountered this on sockets but *all* Python I/O does not handle restart on EINTR. The right place to fix this is probably in C, not the Python library. The places where an I/O operation could be interrupted are practically anywhere the GIL is released. This kind of change is likely to be controversial. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470 From noreply at sourceforge.net Sun Jan 7 21:36:07 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 12:36:07 -0800 Subject: [Patches] [ python-Patches-1630118 ] Patch to add tempfile.SpooledTemporaryFile (for #415692) Message-ID: Patches item #1630118, was opened at 2007-01-07 14:36 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Dustin J. Mitchell (djmitche) Assigned to: Nobody/Anonymous (nobody) Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692) Initial Comment: Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 From noreply at sourceforge.net Sun Jan 7 21:37:22 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 12:37:22 -0800 Subject: [Patches] [ python-Patches-1630118 ] Patch to add tempfile.SpooledTemporaryFile (for #415692) Message-ID: Patches item #1630118, was opened at 2007-01-07 14:36 Message generated for change (Comment added) made by djmitche You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Dustin J. Mitchell (djmitche) Assigned to: Nobody/Anonymous (nobody) Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692) Initial Comment: Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases. ---------------------------------------------------------------------- >Comment By: Dustin J. Mitchell (djmitche) Date: 2007-01-07 14:37 Message: Logged In: YES user_id=7446 Originator: YES File Added: SpooledTemporaryFile.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 From noreply at sourceforge.net Sun Jan 7 22:34:32 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 13:34:32 -0800 Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining on BINARY_SUBSCR Message-ID: Patches item #1629718, was opened at 2007-01-07 06:51 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Hirokazu Yamamoto (ocean-city) Assigned to: Nobody/Anonymous (nobody) Summary: fast tuple[index] by inlining on BINARY_SUBSCR Initial Comment: Hello. I noticed there is speed difference between a = [0,] # list a[0] # fast and a = (0,) # tuple a[0] # slow while solving ICPC puzzle with Python. I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them. After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR). Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-07 22:34 Message: Logged In: YES user_id=21627 Originator: NO It would be helpful to get some statistics on how often this occurs: of all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and how many to other objects? To get some data, you can measure a run of a test suite, or a run of IDLE, or of compileall. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 From noreply at sourceforge.net Sun Jan 7 23:03:49 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 14:03:49 -0800 Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for MINGW Message-ID: Patches item #1597850, was opened at 2006-11-16 16:57 Message generated for change (Comment added) made by rmt38 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Han-Wen Nienhuys (hanwen) Assigned to: Nobody/Anonymous (nobody) Summary: Cross compiling patches for MINGW Initial Comment: Hello, attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python. The patches were originally written by Jan Nieuwenhuizen, my codeveloper. These patches have been tested with Linux/x86, linux/x64 and macos 10.3 as build host and linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at lilypond.org/install/ except for darwin contain the x-compiled python. Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme. It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion. Basically, what we do is override autoconf and Makefile settings through setting enviroment variables. **README section** Cross Compiling --------------- Python can be cross compiled by supplying different --build and --host parameters to configure. Python is compiled on the "build" system and executed on the "host" system. Cross compiling python requires a native Python on the build host, and a natively compiled tool `Pgen'. Before cross compiling, Python must first be compiled and installed on the build host. The configure script will use `cc' and `python', or environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg: CC_FOR_BUILD=gcc-3.3 \ PYTHON_FOR_BUILD=python2.4 \ .../configure --build=i686-linux --host=i586-mingw32 Cross compiling has been tested under linux, mileage may vary for other platforms. A few reminders on using configure to cross compile: - Cross compile tools must be in PATH, - Cross compile tools must be prefixed with the host type (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...), - CC, CXX, AR, and RANLIB must be undefined when running configure, they will be auto-detected. If you need a cross compiler, Debian ships several several (eg: avr, m68hc1x, mingw32), while dpkg-cross easily creates others. Otherwise, check out Dan Kegel's crosstool: http://www.kegel.com/crosstool . ---------------------------------------------------------------------- Comment By: Richard Tew (rmt38) Date: 2007-01-07 22:03 Message: Logged In: YES user_id=1417949 Originator: NO config.cache is not generated or used on my Windows installation of MinGW unless --config-cache is also given as argument to configure, and from the autoconf documentation this seems to be the default behaviour. So you might want to amend the instructions to take that into account. Isn't requiring the user to manually create and edit config.cache resulting in unnecessary work and confusion for the them when it can be addressed in configure.in? Given that checking files is an operation which does not work when cross_compiling is set and checking them results in configure exiting because of this, configure.in can check cross_compiling before trying these checks and avoid them allowing configure to complete. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-07 02:37 Message: Logged In: YES user_id=161998 Originator: YES "checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling" You need to set up a config.cache file that contains the correct entry for ac_cv_file__dev_ptmx ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-07 02:37 Message: Logged In: YES user_id=161998 Originator: YES "checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling" You need to set up a config.cache file that contains the correct entry for ac_cv_file__dev_ptmx ---------------------------------------------------------------------- Comment By: Richard Tew (rmt38) Date: 2007-01-07 01:50 Message: Logged In: YES user_id=1417949 Originator: NO This: AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have /dev/ptmx.])) Is being translated into: echo "$as_me:$LINENO: checking for /dev/ptmx" >&5 echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6 if test "${ac_cv_file__dev_ptmx+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else test "$cross_compiling" = yes && { { echo "$as_me:$LINENO: error: cannot check for file existence when cross compiling" >&5 echo "$as_me: error: cannot check for file existence when cross compiling" >&2;} { (exit 1); exit 1; }; } if test -r "/dev/ptmx"; then ac_cv_file__dev_ptmx=yes else ac_cv_file__dev_ptmx=no fi fi Which exits when I do: $ export CC_FOR_BUILD=gcc $ sh configure --host=arm-eabi With an error like: checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling I am using the latest version of msys/mingw with devkitarm to cross compile. Is this supposed to happen? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:50 Message: Logged In: YES user_id=161998 Originator: YES this is a patch against a SVN checkout of last week. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:48 Message: Logged In: YES user_id=161998 Originator: YES With cross.patch I've been able to build a working freebsd python on linux. Since you had little problems with the X-compile patches, I'm resubmitting those first. I'd like to give our (admittedly: oddball) mingw version another go when the X-compile patches are in python SVN. Regarding your comments: * what would be a better to import the SO setting? the most reliable way to get something out of a makefile into python is VAR=foo export VAR .. os.environ['VAR'] this doesn't introduce any fragility in parsing/expanding/(un)quoting, so it's actually pretty good. Right now, I'm overriding sysconfig wholesale in setup.py with a sysconfig._config_vars.update (os.environ) but I'm not sure that this affects the settings in build_ext.py. A freebsd -> linux compile does not touch that code, so if you dislike it, we can leave it out. * I've documented the .x extension File Added: cross.patch ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:12 Message: Logged In: YES user_id=21627 Originator: NO One more note: it would be best if the patches were against the subversion trunk. They won't be included in the 2.5 maintenance branch (as they are a new feature), so they need to be ported to the trunk, anyway. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:06 Message: Logged In: YES user_id=21627 Originator: NO I'll add my comments as I go through the patches. cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc: - why is there now a mingw32msvc2 platform? If the target is mingw (rather than Cygwin), I'd expect that the target is just Win32/Windows, and that all symbolic constants provided be usable across all Win32 Pythons. - why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a target header file? - please include any plat-* files that you generate in the patch. - why do you need dl_nt.c in Modules? Please make it use the one from PC (consider updating the comment about calling initall) b52dbbbbc3adece61496b161d8c22599caae2311 - please combine all patches adding support for __MINGW32__ into a single one. Why is anything needed here at all? I thought Python compiles already with mingw32 (on Windows)? - what is the exclusion of freezing for? 059af829d362b10bb5921367c93a56dbb51ef31b - Why are you taking timeval from winsock2.h? It should come from sys/time.h, and does in my copy of Debian mingw32-runtime. 6a742fb15b28564f9a1bc916c76a28dc672a9b2c - Why are these changes needed? It's Windows, and that is already supported. a838b4780998ef98ae4880c3916274d45b661c82 - Why doesn't that already work on Windows+cygwin+mingw32? f452fe4b95085d8c1ba838bf302a6a48df3c1d31 - I think this should target msvcr71.dll, not msvcrt.dll Please also combine the cross-compilation patches into a single one. - there is no need to provide pyconfig.h.in changes; I'll regenerate that, anyway. 9c022e407c366c9f175e9168542ccc76eae9e3f0 - please integrate those into the large AC_CHECK_FUNCS that already exists 540684d696df6057ee2c9c4e13e33fe450605ffa - Why are you stripping -Wl? 64f5018e975419b2d37c39f457c8732def3288df - Try getting SO from the Makefile, not from the environment (I assume this is also meant to support true distutils packages some day). 7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c - again: what is the "mingw" platform? 7d3a45788a0d83608d10e5c0a34f08b426d62e92 - is this really necessary? I suggest to drop it 23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea - don't include bits/time.h; it's not meant for direct inclusion 6689ca9dea07afbe8a77b7787a5c4e1642f803a1 - what's a .x file? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-25 15:12 Message: Logged In: YES user_id=161998 Originator: YES I've sent the agreement by snailmail. ---------------------------------------------------------------------- Comment By: Jan Nieuwenhuizen (janneke-sf) Date: 2006-11-17 19:57 Message: Logged In: YES user_id=1368960 Originator: NO I do not mind either. I've just signed and faxed contrib-form.html. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:33 Message: Logged In: YES user_id=161998 Originator: YES note that not all of the patch needs to go in its current form. In particular, setup.py should be much more clever to look into build-root for finding libs and include files. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:32 Message: Logged In: YES user_id=161998 Originator: YES I don't mind, and I expect Jan won't have a problem either. What's the procedure: do we send the disclaimer first, or do you do the review, or does everything happen in parallel? ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-11-16 21:47 Message: Logged In: YES user_id=21627 Originator: NO Would you and Jan Nieuwenhuizen be willing to sign the contributor agreement, at http://www.python.org/psf/contrib.html I haven't reviewed the patch yet; if they can be integrated, that will only happen in the trunk (i.e. not for 2.5.x). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 From noreply at sourceforge.net Mon Jan 8 04:02:03 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 19:02:03 -0800 Subject: [Patches] [ python-Patches-1630248 ] Implement named exception cleanup Message-ID: Patches item #1630248, was opened at 2007-01-07 22:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Implement named exception cleanup Initial Comment: This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. Specifically, try: ... except ExcType, e: #body is translated to try: ... except ExcType, e: try: # body finally: e = None del e The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 From noreply at sourceforge.net Mon Jan 8 04:02:24 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 19:02:24 -0800 Subject: [Patches] [ python-Patches-1630248 ] Implement named exception cleanup Message-ID: Patches item #1630248, was opened at 2007-01-07 22:02 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Implement named exception cleanup Initial Comment: This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. Specifically, try: ... except ExcType, e: #body is translated to try: ... except ExcType, e: try: # body finally: e = None del e The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:02 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 From noreply at sourceforge.net Mon Jan 8 04:34:35 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 19:34:35 -0800 Subject: [Patches] [ python-Patches-1630248 ] Implement named exception cleanup Message-ID: Patches item #1630248, was opened at 2007-01-07 22:02 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Implement named exception cleanup Initial Comment: This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. Specifically, try: ... except ExcType, e: #body is translated to try: ... except ExcType, e: try: # body finally: e = None del e The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:34 Message: Logged In: YES user_id=1344176 Originator: YES This is the first time I've done this kind of surgery on the compiler, so any tips/tricks/advice would be greatly appreciated. ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:02 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 From noreply at sourceforge.net Mon Jan 8 04:56:53 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 19:56:53 -0800 Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining on BINARY_SUBSCR Message-ID: Patches item #1629718, was opened at 2007-01-07 00:51 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Hirokazu Yamamoto (ocean-city) Assigned to: Nobody/Anonymous (nobody) Summary: fast tuple[index] by inlining on BINARY_SUBSCR Initial Comment: Hello. I noticed there is speed difference between a = [0,] # list a[0] # fast and a = (0,) # tuple a[0] # slow while solving ICPC puzzle with Python. I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them. After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR). Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2007-01-07 22:56 Message: Logged In: YES user_id=80475 Originator: NO I recommend against this. Any additional specialization code will necessarily slow down other cases handled by PyObject_GetItem. So, the merits of speeding-up tuple indexing need to be weighed against the costs (slowing down other code and the excess loading of ceval.c with specialization code). Also, I reject the premise that there is no conceptual difference between list and tuple indexing. The former is a primary use case for lists and the latter is only incidental to tuple use cases (see the endless discussions on python-dev and comp.lang.python about why tuples are not to be regarded as immutable lists and in fact have a different intended set of uses). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-07 16:34 Message: Logged In: YES user_id=21627 Originator: NO It would be helpful to get some statistics on how often this occurs: of all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and how many to other objects? To get some data, you can measure a run of a test suite, or a run of IDLE, or of compileall. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 From noreply at sourceforge.net Mon Jan 8 06:49:51 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 21:49:51 -0800 Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining on BINARY_SUBSCR Message-ID: Patches item #1629718, was opened at 2007-01-07 14:51 Message generated for change (Comment added) made by ocean-city You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Hirokazu Yamamoto (ocean-city) Assigned to: Nobody/Anonymous (nobody) Summary: fast tuple[index] by inlining on BINARY_SUBSCR Initial Comment: Hello. I noticed there is speed difference between a = [0,] # list a[0] # fast and a = (0,) # tuple a[0] # slow while solving ICPC puzzle with Python. I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them. After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR). Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you. ---------------------------------------------------------------------- >Comment By: Hirokazu Yamamoto (ocean-city) Date: 2007-01-08 14:49 Message: Logged In: YES user_id=1200846 Originator: YES Sorry, I want to withdraw this. Python/lib/test/testall.py ===> list: 2541719, tuple: 620815, other: 6174214 The ratio of tuple seems relatively low. File Added: statistics.patch ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2007-01-08 12:56 Message: Logged In: YES user_id=80475 Originator: NO I recommend against this. Any additional specialization code will necessarily slow down other cases handled by PyObject_GetItem. So, the merits of speeding-up tuple indexing need to be weighed against the costs (slowing down other code and the excess loading of ceval.c with specialization code). Also, I reject the premise that there is no conceptual difference between list and tuple indexing. The former is a primary use case for lists and the latter is only incidental to tuple use cases (see the endless discussions on python-dev and comp.lang.python about why tuples are not to be regarded as immutable lists and in fact have a different intended set of uses). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-08 06:34 Message: Logged In: YES user_id=21627 Originator: NO It would be helpful to get some statistics on how often this occurs: of all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and how many to other objects? To get some data, you can measure a run of a test suite, or a run of IDLE, or of compileall. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 From noreply at sourceforge.net Mon Jan 8 07:58:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 22:58:11 -0800 Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining on BINARY_SUBSCR Message-ID: Patches item #1629718, was opened at 2007-01-07 14:51 Message generated for change (Comment added) made by ocean-city You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Hirokazu Yamamoto (ocean-city) Assigned to: Nobody/Anonymous (nobody) Summary: fast tuple[index] by inlining on BINARY_SUBSCR Initial Comment: Hello. I noticed there is speed difference between a = [0,] # list a[0] # fast and a = (0,) # tuple a[0] # slow while solving ICPC puzzle with Python. I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them. After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR). Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you. ---------------------------------------------------------------------- >Comment By: Hirokazu Yamamoto (ocean-city) Date: 2007-01-08 15:58 Message: Logged In: YES user_id=1200846 Originator: YES >see the endless discussions on python-dev... Thank you, rhettinger. I'm interested in it. I'll see them. ---------------------------------------------------------------------- Comment By: Hirokazu Yamamoto (ocean-city) Date: 2007-01-08 14:49 Message: Logged In: YES user_id=1200846 Originator: YES Sorry, I want to withdraw this. Python/lib/test/testall.py ===> list: 2541719, tuple: 620815, other: 6174214 The ratio of tuple seems relatively low. File Added: statistics.patch ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2007-01-08 12:56 Message: Logged In: YES user_id=80475 Originator: NO I recommend against this. Any additional specialization code will necessarily slow down other cases handled by PyObject_GetItem. So, the merits of speeding-up tuple indexing need to be weighed against the costs (slowing down other code and the excess loading of ceval.c with specialization code). Also, I reject the premise that there is no conceptual difference between list and tuple indexing. The former is a primary use case for lists and the latter is only incidental to tuple use cases (see the endless discussions on python-dev and comp.lang.python about why tuples are not to be regarded as immutable lists and in fact have a different intended set of uses). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-08 06:34 Message: Logged In: YES user_id=21627 Originator: NO It would be helpful to get some statistics on how often this occurs: of all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and how many to other objects? To get some data, you can measure a run of a test suite, or a run of IDLE, or of compileall. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 From noreply at sourceforge.net Mon Jan 8 08:19:33 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 23:19:33 -0800 Subject: [Patches] [ python-Patches-1629718 ] fast tuple[index] by inlining on BINARY_SUBSCR Message-ID: Patches item #1629718, was opened at 2007-01-07 06:51 Message generated for change (Settings changed) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 >Status: Closed >Resolution: Rejected Priority: 5 Private: No Submitted By: Hirokazu Yamamoto (ocean-city) Assigned to: Nobody/Anonymous (nobody) Summary: fast tuple[index] by inlining on BINARY_SUBSCR Initial Comment: Hello. I noticed there is speed difference between a = [0,] # list a[0] # fast and a = (0,) # tuple a[0] # slow while solving ICPC puzzle with Python. I thought this is wierd because, indeed tuple is readonly, there is no conceptual difference between list and tuple when 'extract' item from them. After investigation, I found this difference comes from the shortcut for list on ceval.c (BINARY_SUBSCR). Is it valuable to put shortcut for tuple too? I'll attach the patch for release-maint25 branch. Thank you. ---------------------------------------------------------------------- Comment By: Hirokazu Yamamoto (ocean-city) Date: 2007-01-08 07:58 Message: Logged In: YES user_id=1200846 Originator: YES >see the endless discussions on python-dev... Thank you, rhettinger. I'm interested in it. I'll see them. ---------------------------------------------------------------------- Comment By: Hirokazu Yamamoto (ocean-city) Date: 2007-01-08 06:49 Message: Logged In: YES user_id=1200846 Originator: YES Sorry, I want to withdraw this. Python/lib/test/testall.py ===> list: 2541719, tuple: 620815, other: 6174214 The ratio of tuple seems relatively low. File Added: statistics.patch ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2007-01-08 04:56 Message: Logged In: YES user_id=80475 Originator: NO I recommend against this. Any additional specialization code will necessarily slow down other cases handled by PyObject_GetItem. So, the merits of speeding-up tuple indexing need to be weighed against the costs (slowing down other code and the excess loading of ceval.c with specialization code). Also, I reject the premise that there is no conceptual difference between list and tuple indexing. The former is a primary use case for lists and the latter is only incidental to tuple use cases (see the endless discussions on python-dev and comp.lang.python about why tuples are not to be regarded as immutable lists and in fact have a different intended set of uses). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-07 22:34 Message: Logged In: YES user_id=21627 Originator: NO It would be helpful to get some statistics on how often this occurs: of all case of BINARY_SUBSCR, how man refer to tuples, how many to lists, and how many to other objects? To get some data, you can measure a run of a test suite, or a run of IDLE, or of compileall. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629718&group_id=5470 From noreply at sourceforge.net Mon Jan 8 08:45:06 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 23:45:06 -0800 Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map Message-ID: Patches item #1616979, was opened at 2006-12-16 15:24 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexander Belchenko (bialix) Assigned to: Nobody/Anonymous (nobody) Summary: cp720 encoding map Initial Comment: I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage. Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-08 08:45 Message: Logged In: YES user_id=21627 Originator: NO Where did you get CP720.txt from? Just generating the file is not good enough: it must be integrated somehow into Tools/unicode. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 From noreply at sourceforge.net Mon Jan 8 08:59:08 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 07 Jan 2007 23:59:08 -0800 Subject: [Patches] [ python-Patches-1597850 ] Cross compiling patches for MINGW Message-ID: Patches item #1597850, was opened at 2006-11-16 16:57 Message generated for change (Comment added) made by hanwen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Build Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Han-Wen Nienhuys (hanwen) Assigned to: Nobody/Anonymous (nobody) Summary: Cross compiling patches for MINGW Initial Comment: Hello, attached tarbal is a patch bomb of 32 patches against python 2.5, that we lilypond developers use for crosscompiling python. The patches were originally written by Jan Nieuwenhuizen, my codeveloper. These patches have been tested with Linux/x86, linux/x64 and macos 10.3 as build host and linux-{ppc,x86,x86_64}, freebsd, mingw as target platform. All packages at lilypond.org/install/ except for darwin contain the x-compiled python. Each patch is prefixed with a small comment, but for reference, I include a snippet from the readme. It would be nice if at least some of the patches were included. In particular, I think that X-compiling is a common request, so it warrants inclusion. Basically, what we do is override autoconf and Makefile settings through setting enviroment variables. **README section** Cross Compiling --------------- Python can be cross compiled by supplying different --build and --host parameters to configure. Python is compiled on the "build" system and executed on the "host" system. Cross compiling python requires a native Python on the build host, and a natively compiled tool `Pgen'. Before cross compiling, Python must first be compiled and installed on the build host. The configure script will use `cc' and `python', or environment variables CC_FOR_BUILD or PYTHON_FOR_BUILD, eg: CC_FOR_BUILD=gcc-3.3 \ PYTHON_FOR_BUILD=python2.4 \ .../configure --build=i686-linux --host=i586-mingw32 Cross compiling has been tested under linux, mileage may vary for other platforms. A few reminders on using configure to cross compile: - Cross compile tools must be in PATH, - Cross compile tools must be prefixed with the host type (ie i586-mingw32-gcc, i586-mingw32-ranlib, ...), - CC, CXX, AR, and RANLIB must be undefined when running configure, they will be auto-detected. If you need a cross compiler, Debian ships several several (eg: avr, m68hc1x, mingw32), while dpkg-cross easily creates others. Otherwise, check out Dan Kegel's crosstool: http://www.kegel.com/crosstool . ---------------------------------------------------------------------- >Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-08 07:59 Message: Logged In: YES user_id=161998 Originator: YES Regarding --config-cache, yes you're correct. Regarding extending configure.in, it does already say "configure: error: cannot check for file existence when cross compiling" and exit. What more would you like it to do? I could add a check that the --config-cache is given, although that is not strictly necessary (You can also set the variables in the environment.) ---------------------------------------------------------------------- Comment By: Richard Tew (rmt38) Date: 2007-01-07 22:03 Message: Logged In: YES user_id=1417949 Originator: NO config.cache is not generated or used on my Windows installation of MinGW unless --config-cache is also given as argument to configure, and from the autoconf documentation this seems to be the default behaviour. So you might want to amend the instructions to take that into account. Isn't requiring the user to manually create and edit config.cache resulting in unnecessary work and confusion for the them when it can be addressed in configure.in? Given that checking files is an operation which does not work when cross_compiling is set and checking them results in configure exiting because of this, configure.in can check cross_compiling before trying these checks and avoid them allowing configure to complete. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-07 02:37 Message: Logged In: YES user_id=161998 Originator: YES "checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling" You need to set up a config.cache file that contains the correct entry for ac_cv_file__dev_ptmx ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2007-01-07 02:37 Message: Logged In: YES user_id=161998 Originator: YES "checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling" You need to set up a config.cache file that contains the correct entry for ac_cv_file__dev_ptmx ---------------------------------------------------------------------- Comment By: Richard Tew (rmt38) Date: 2007-01-07 01:50 Message: Logged In: YES user_id=1417949 Originator: NO This: AC_CHECK_FILE(/dev/ptmx, AC_DEFINE(HAVE_DEV_PTMX, 1, [Define if we have /dev/ptmx.])) Is being translated into: echo "$as_me:$LINENO: checking for /dev/ptmx" >&5 echo $ECHO_N "checking for /dev/ptmx... $ECHO_C" >&6 if test "${ac_cv_file__dev_ptmx+set}" = set; then echo $ECHO_N "(cached) $ECHO_C" >&6 else test "$cross_compiling" = yes && { { echo "$as_me:$LINENO: error: cannot check for file existence when cross compiling" >&5 echo "$as_me: error: cannot check for file existence when cross compiling" >&2;} { (exit 1); exit 1; }; } if test -r "/dev/ptmx"; then ac_cv_file__dev_ptmx=yes else ac_cv_file__dev_ptmx=no fi fi Which exits when I do: $ export CC_FOR_BUILD=gcc $ sh configure --host=arm-eabi With an error like: checking for /dev/ptmx... configure: error: cannot check for file existence when cross compiling I am using the latest version of msys/mingw with devkitarm to cross compile. Is this supposed to happen? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:50 Message: Logged In: YES user_id=161998 Originator: YES this is a patch against a SVN checkout of last week. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-09 23:48 Message: Logged In: YES user_id=161998 Originator: YES With cross.patch I've been able to build a working freebsd python on linux. Since you had little problems with the X-compile patches, I'm resubmitting those first. I'd like to give our (admittedly: oddball) mingw version another go when the X-compile patches are in python SVN. Regarding your comments: * what would be a better to import the SO setting? the most reliable way to get something out of a makefile into python is VAR=foo export VAR .. os.environ['VAR'] this doesn't introduce any fragility in parsing/expanding/(un)quoting, so it's actually pretty good. Right now, I'm overriding sysconfig wholesale in setup.py with a sysconfig._config_vars.update (os.environ) but I'm not sure that this affects the settings in build_ext.py. A freebsd -> linux compile does not touch that code, so if you dislike it, we can leave it out. * I've documented the .x extension File Added: cross.patch ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:12 Message: Logged In: YES user_id=21627 Originator: NO One more note: it would be best if the patches were against the subversion trunk. They won't be included in the 2.5 maintenance branch (as they are a new feature), so they need to be ported to the trunk, anyway. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-06 20:06 Message: Logged In: YES user_id=21627 Originator: NO I'll add my comments as I go through the patches. cab1e7d1e54d14a8aab52f0c3b3073c93f75d4fc: - why is there now a mingw32msvc2 platform? If the target is mingw (rather than Cygwin), I'd expect that the target is just Win32/Windows, and that all symbolic constants provided be usable across all Win32 Pythons. - why is h2py run for /usr/include/netinet/in.h? Shouldn't it operate on a target header file? - please include any plat-* files that you generate in the patch. - why do you need dl_nt.c in Modules? Please make it use the one from PC (consider updating the comment about calling initall) b52dbbbbc3adece61496b161d8c22599caae2311 - please combine all patches adding support for __MINGW32__ into a single one. Why is anything needed here at all? I thought Python compiles already with mingw32 (on Windows)? - what is the exclusion of freezing for? 059af829d362b10bb5921367c93a56dbb51ef31b - Why are you taking timeval from winsock2.h? It should come from sys/time.h, and does in my copy of Debian mingw32-runtime. 6a742fb15b28564f9a1bc916c76a28dc672a9b2c - Why are these changes needed? It's Windows, and that is already supported. a838b4780998ef98ae4880c3916274d45b661c82 - Why doesn't that already work on Windows+cygwin+mingw32? f452fe4b95085d8c1ba838bf302a6a48df3c1d31 - I think this should target msvcr71.dll, not msvcrt.dll Please also combine the cross-compilation patches into a single one. - there is no need to provide pyconfig.h.in changes; I'll regenerate that, anyway. 9c022e407c366c9f175e9168542ccc76eae9e3f0 - please integrate those into the large AC_CHECK_FUNCS that already exists 540684d696df6057ee2c9c4e13e33fe450605ffa - Why are you stripping -Wl? 64f5018e975419b2d37c39f457c8732def3288df - Try getting SO from the Makefile, not from the environment (I assume this is also meant to support true distutils packages some day). 7a4e50fb1cf5ff3481aaf7515a784621cbbdac6c - again: what is the "mingw" platform? 7d3a45788a0d83608d10e5c0a34f08b426d62e92 - is this really necessary? I suggest to drop it 23a2dd14933a2aee69f7cdc9f838e4b9c26c1eea - don't include bits/time.h; it's not meant for direct inclusion 6689ca9dea07afbe8a77b7787a5c4e1642f803a1 - what's a .x file? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-25 15:12 Message: Logged In: YES user_id=161998 Originator: YES I've sent the agreement by snailmail. ---------------------------------------------------------------------- Comment By: Jan Nieuwenhuizen (janneke-sf) Date: 2006-11-17 19:57 Message: Logged In: YES user_id=1368960 Originator: NO I do not mind either. I've just signed and faxed contrib-form.html. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:33 Message: Logged In: YES user_id=161998 Originator: YES note that not all of the patch needs to go in its current form. In particular, setup.py should be much more clever to look into build-root for finding libs and include files. ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-11-17 00:32 Message: Logged In: YES user_id=161998 Originator: YES I don't mind, and I expect Jan won't have a problem either. What's the procedure: do we send the disclaimer first, or do you do the review, or does everything happen in parallel? ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-11-16 21:47 Message: Logged In: YES user_id=21627 Originator: NO Would you and Jan Nieuwenhuizen be willing to sign the contributor agreement, at http://www.python.org/psf/contrib.html I haven't reviewed the patch yet; if they can be integrated, that will only happen in the trunk (i.e. not for 2.5.x). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1597850&group_id=5470 From noreply at sourceforge.net Mon Jan 8 09:26:44 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 00:26:44 -0800 Subject: [Patches] [ python-Patches-1630118 ] Patch to add tempfile.SpooledTemporaryFile (for #415692) Message-ID: Patches item #1630118, was opened at 2007-01-07 20:36 Message generated for change (Comment added) made by arigo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Dustin J. Mitchell (djmitche) Assigned to: Nobody/Anonymous (nobody) Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692) Initial Comment: Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases. ---------------------------------------------------------------------- >Comment By: Armin Rigo (arigo) Date: 2007-01-08 08:26 Message: Logged In: YES user_id=4771 Originator: NO The __getattr__ magic makes the following kind of code fail with SpooledTemporaryFile: f = SpooledTemporaryFile(max_size=something) rd = f.read wr = f.write for x in y: ...use rd(size) and wr(data)... The problem is that the captured 'f.read' method is the one from the StringIO instance, even after the write() rolled the file over to disk. Given that capturing bound methods is a semi-official speed hack advertised in some respected places, we might have to be careful about it. About such matters I am biased towards first getting it right and then getting it fast... Also, Python 2.5 is already out, so this will probably be a 2.6 addition. ---------------------------------------------------------------------- Comment By: Dustin J. Mitchell (djmitche) Date: 2007-01-07 20:37 Message: Logged In: YES user_id=7446 Originator: YES File Added: SpooledTemporaryFile.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 From noreply at sourceforge.net Mon Jan 8 11:26:39 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 02:26:39 -0800 Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map Message-ID: Patches item #1616979, was opened at 2006-12-16 15:24 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexander Belchenko (bialix) Assigned to: Nobody/Anonymous (nobody) Summary: cp720 encoding map Initial Comment: I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage. Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 11:26 Message: Logged In: YES user_id=38388 Originator: NO Please provide a reference defining the encoding. The only reference I could find was http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx but that doesn't provide the mapping table. Thanks. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-08 08:45 Message: Logged In: YES user_id=21627 Originator: NO Where did you get CP720.txt from? Just generating the file is not good enough: it must be integrated somehow into Tools/unicode. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 From noreply at sourceforge.net Mon Jan 8 11:51:51 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 02:51:51 -0800 Subject: [Patches] [ python-Patches-1628205 ] socket.readline() interface doesn't handle EINTR properly Message-ID: Patches item #1628205, was opened at 2007-01-04 13:37 Message generated for change (Comment added) made by sobomax You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Maxim Sobolev (sobomax) Assigned to: Nobody/Anonymous (nobody) Summary: socket.readline() interface doesn't handle EINTR properly Initial Comment: The socket.readline() interface doesn't handle EINTR properly. Currently, when EINTR received exception is not handled and all data that has been in the buffer is lost. There is no way to recover that data from the code that uses the interface. Correct behaviour would be to catch EINTR and restart recv(). Patch is attached. Following is the real world example of how it affects httplib module: File "/usr/local/lib/python2.4/xmlrpclib.py", line 1096, in __call__ return self.__send(self.__name, args) File "/usr/local/lib/python2.4/xmlrpclib.py", line 1383, in __request verbose=self.__verbose File "/usr/local/lib/python2.4/xmlrpclib.py", line 1131, in request errcode, errmsg, headers = h.getreply() File "/usr/local/lib/python2.4/httplib.py", line 1137, in getreply response = self._conn.getresponse() File "/usr/local/lib/python2.4/httplib.py", line 866, in getresponse response.begin() File "/usr/local/lib/python2.4/httplib.py", line 336, in begin version, status, reason = self._read_status() File "/usr/local/lib/python2.4/httplib.py", line 294, in _read_status line = self.fp.readline() File "/usr/local/lib/python2.4/socket.py", line 325, in readline data = recv(1) error: (4, 'Interrupted system call') -Maxim ---------------------------------------------------------------------- >Comment By: Maxim Sobolev (sobomax) Date: 2007-01-08 02:51 Message: Logged In: YES user_id=24670 Originator: YES Well, it's not quite correct since for example httplib.py tries to handle EINTR. The fundamental problem with socket.readline() is that it does internal buffering so that getting EINTR results in data being lost. I don't think it has to be fixed in C, since recv() is very low-level interface and it is expected to return EINTR on signal, so that "fixing" it there could possibly break software that relies on this behaviour. And I don't quite buy your reasoning - "since it's broken in few more places let's keep it consistently broken everywhere". To me it sounds like attempt to hide the head in the sand instead of facing the problem at hand. Fixing socket.readline() may be the first step in improvind the library to handle this condition properly. ---------------------------------------------------------------------- Comment By: Oren Tirosh (orenti) Date: 2007-01-07 10:24 Message: Logged In: YES user_id=562624 Originator: NO You may have encountered this on sockets but *all* Python I/O does not handle restart on EINTR. The right place to fix this is probably in C, not the Python library. The places where an I/O operation could be interrupted are practically anywhere the GIL is released. This kind of change is likely to be controversial. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1628205&group_id=5470 From noreply at sourceforge.net Mon Jan 8 11:59:50 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 02:59:50 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 10:37 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 11:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 06:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Mon Jan 8 13:47:08 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 04:47:08 -0800 Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map Message-ID: Patches item #1616979, was opened at 2006-12-16 16:24 Message generated for change (Comment added) made by bialix You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexander Belchenko (bialix) Assigned to: Nobody/Anonymous (nobody) Summary: cp720 encoding map Initial Comment: I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage. Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation. ---------------------------------------------------------------------- >Comment By: Alexander Belchenko (bialix) Date: 2007-01-08 14:47 Message: Logged In: YES user_id=957594 Originator: YES When I start working on cp720 I'm search in google for cp720. I found this presentation with actual map of chars: http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt Then I try to search for CP720.txt file and I found this page: http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html I download archive from that page and use CP720.txt to generate cp720.py. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 12:26 Message: Logged In: YES user_id=38388 Originator: NO Please provide a reference defining the encoding. The only reference I could find was http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx but that doesn't provide the mapping table. Thanks. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-08 09:45 Message: Logged In: YES user_id=21627 Originator: NO Where did you get CP720.txt from? Just generating the file is not good enough: it must be integrated somehow into Tools/unicode. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 From noreply at sourceforge.net Mon Jan 8 14:33:18 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 05:33:18 -0800 Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map Message-ID: Patches item #1616979, was opened at 2006-12-16 16:24 Message generated for change (Comment added) made by bialix You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexander Belchenko (bialix) Assigned to: Nobody/Anonymous (nobody) Summary: cp720 encoding map Initial Comment: I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage. Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation. ---------------------------------------------------------------------- >Comment By: Alexander Belchenko (bialix) Date: 2007-01-08 15:33 Message: Logged In: YES user_id=957594 Originator: YES File Added: CP720.TXT ---------------------------------------------------------------------- Comment By: Alexander Belchenko (bialix) Date: 2007-01-08 14:47 Message: Logged In: YES user_id=957594 Originator: YES When I start working on cp720 I'm search in google for cp720. I found this presentation with actual map of chars: http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt Then I try to search for CP720.txt file and I found this page: http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html I download archive from that page and use CP720.txt to generate cp720.py. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 12:26 Message: Logged In: YES user_id=38388 Originator: NO Please provide a reference defining the encoding. The only reference I could find was http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx but that doesn't provide the mapping table. Thanks. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-08 09:45 Message: Logged In: YES user_id=21627 Originator: NO Where did you get CP720.txt from? Just generating the file is not good enough: it must be integrated somehow into Tools/unicode. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 From noreply at sourceforge.net Mon Jan 8 14:47:10 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 05:47:10 -0800 Subject: [Patches] [ python-Patches-1616979 ] cp720 encoding map Message-ID: Patches item #1616979, was opened at 2006-12-16 16:24 Message generated for change (Comment added) made by bialix You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexander Belchenko (bialix) Assigned to: Nobody/Anonymous (nobody) Summary: cp720 encoding map Initial Comment: I'm working on Bazaar (bzr) VCS. One of our user report about bug that occurs because of his Windows XP machine use cp720 codepage for DOS console. cp720 is OEM Arabic codepage. Python standard library does not have encoding map for this encoding so I create corresponding one. Attached patch provide cp720.py file for encodings package and mention this encoding in documentation. ---------------------------------------------------------------------- >Comment By: Alexander Belchenko (bialix) Date: 2007-01-08 15:47 Message: Logged In: YES user_id=957594 Originator: YES Here is the map on the Microsoft site: http://www.microsoft.com/globaldev/reference/oem/720.mspx ---------------------------------------------------------------------- Comment By: Alexander Belchenko (bialix) Date: 2007-01-08 15:33 Message: Logged In: YES user_id=957594 Originator: YES File Added: CP720.TXT ---------------------------------------------------------------------- Comment By: Alexander Belchenko (bialix) Date: 2007-01-08 14:47 Message: Logged In: YES user_id=957594 Originator: YES When I start working on cp720 I'm search in google for cp720. I found this presentation with actual map of chars: http://stanley.cs.toronto.edu/presentations/2005-winter/unicode.ppt Then I try to search for CP720.txt file and I found this page: http://www.haible.de/bruno/charsets/conversion-tables/Arabic-other.html I download archive from that page and use CP720.txt to generate cp720.py. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 12:26 Message: Logged In: YES user_id=38388 Originator: NO Please provide a reference defining the encoding. The only reference I could find was http://msdn2.microsoft.com/en-us/library/system.text.encoding(vs.80).aspx but that doesn't provide the mapping table. Thanks. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-08 09:45 Message: Logged In: YES user_id=21627 Originator: NO Where did you get CP720.txt from? Just generating the file is not good enough: it must be integrated somehow into Tools/unicode. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1616979&group_id=5470 From noreply at sourceforge.net Mon Jan 8 16:53:26 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 07:53:26 -0800 Subject: [Patches] [ python-Patches-1630118 ] Patch to add tempfile.SpooledTemporaryFile (for #415692) Message-ID: Patches item #1630118, was opened at 2007-01-07 14:36 Message generated for change (Comment added) made by djmitche You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Dustin J. Mitchell (djmitche) Assigned to: Nobody/Anonymous (nobody) Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692) Initial Comment: Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases. ---------------------------------------------------------------------- >Comment By: Dustin J. Mitchell (djmitche) Date: 2007-01-08 09:53 Message: Logged In: YES user_id=7446 Originator: YES I agree it would break in such a situation, but I'm not clear on which direction your bias leads you (specifically, which do we get right -- don't use bound methods, or don't use the __getattr__ magic?). I could fix this by defining "proxy" functions (and some properties) for the whole file interface, rather than just the methods that potentially trigger rollover. That would lose a little efficiency, but mostly only in reading (calling e.g., f.read() will always result in two function applications; in the current model, after the first call it runs at "native" speed). It would also lose forward compatibility if the file protocol changes, although I'm not sure how likely that is. Would you like me to do that? ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2007-01-08 02:26 Message: Logged In: YES user_id=4771 Originator: NO The __getattr__ magic makes the following kind of code fail with SpooledTemporaryFile: f = SpooledTemporaryFile(max_size=something) rd = f.read wr = f.write for x in y: ...use rd(size) and wr(data)... The problem is that the captured 'f.read' method is the one from the StringIO instance, even after the write() rolled the file over to disk. Given that capturing bound methods is a semi-official speed hack advertised in some respected places, we might have to be careful about it. About such matters I am biased towards first getting it right and then getting it fast... Also, Python 2.5 is already out, so this will probably be a 2.6 addition. ---------------------------------------------------------------------- Comment By: Dustin J. Mitchell (djmitche) Date: 2007-01-07 14:37 Message: Logged In: YES user_id=7446 Originator: YES File Added: SpooledTemporaryFile.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 From noreply at sourceforge.net Mon Jan 8 17:49:42 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 08:49:42 -0800 Subject: [Patches] [ python-Patches-1630248 ] Implement named exception cleanup Message-ID: Patches item #1630248, was opened at 2007-01-07 22:02 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Implement named exception cleanup Initial Comment: This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. Specifically, try: ... except ExcType, e: #body is translated to try: ... except ExcType, e: try: # body finally: e = None del e The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:49 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:34 Message: Logged In: YES user_id=1344176 Originator: YES This is the first time I've done this kind of surgery on the compiler, so any tips/tricks/advice would be greatly appreciated. ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:02 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 From noreply at sourceforge.net Mon Jan 8 17:50:00 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 08:50:00 -0800 Subject: [Patches] [ python-Patches-1630248 ] Implement named exception cleanup Message-ID: Patches item #1630248, was opened at 2007-01-07 22:02 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Implement named exception cleanup Initial Comment: This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. Specifically, try: ... except ExcType, e: #body is translated to try: ... except ExcType, e: try: # body finally: e = None del e The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:50 Message: Logged In: YES user_id=1344176 Originator: YES File Added: exc_cleanup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:49 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:34 Message: Logged In: YES user_id=1344176 Originator: YES This is the first time I've done this kind of surgery on the compiler, so any tips/tricks/advice would be greatly appreciated. ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:02 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 From noreply at sourceforge.net Mon Jan 8 17:51:57 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 08:51:57 -0800 Subject: [Patches] [ python-Patches-1630248 ] Implement named exception cleanup Message-ID: Patches item #1630248, was opened at 2007-01-07 22:02 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Implement named exception cleanup Initial Comment: This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. Specifically, try: ... except ExcType, e: #body is translated to try: ... except ExcType, e: try: # body finally: e = None del e The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:51 Message: Logged In: YES user_id=1344176 Originator: YES Patches updated in reponse to PJE's comment (http://mail.python.org/pipermail/python-3000/2007-January/005430.html): """In the tuple or list case, there's no need to reset the variables, because then the traceback won't be present any more; the exception object will have been discarded after unpacking.""" ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:50 Message: Logged In: YES user_id=1344176 Originator: YES File Added: exc_cleanup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:49 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:34 Message: Logged In: YES user_id=1344176 Originator: YES This is the first time I've done this kind of surgery on the compiler, so any tips/tricks/advice would be greatly appreciated. ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:02 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 From noreply at sourceforge.net Mon Jan 8 19:50:02 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 10:50:02 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Mon Jan 8 21:44:57 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 12:44:57 -0800 Subject: [Patches] [ python-Patches-909005 ] asyncore fixes and improvements Message-ID: Patches item #909005, was opened at 2004-03-03 16:07 Message generated for change (Comment added) made by klimkin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Alexey Klimkin (klimkin) Assigned to: A.M. Kuchling (akuchling) Summary: asyncore fixes and improvements Initial Comment: Minor: * 0/1 for boolean values replaced with False/True. * (887279) Added handling of POLLPRI as POLLIN. POLLERR, POLLHUP, POLLNVAL are handled as exception event. handle_expt_event gets recent error from self.socket object and raises socket.error. * Default readable()/writable() returns False. * Added "map" parameter for file_dispatcher. * file_wrapper: removed "return" in close(), recv/read and send/write swapped because of their nature. * mac code for writable() removed. Manual for accept() on mac is similar to the one on linux. * Repeating exception changed from "raise socket.error, why" to raise. * Added connected/accepting/addr reset on close(). Initialization of variables moved to __init__. * close_all() now calls close for dispatcher object, EBADF treated as already closed socket/file. * Added channel id to "unhandled..." messages. Bugs: * Fixed bug (654766,889153): client never gets connected, nor errored. Connecting client gets writable event from select(), however, some client may want always be non writable. Such client may never get connected. The fix adds _readable() - always True for accepting and always False for connecting socket; and _writable() - always False for accepting and always True for connecting socket. This implies, that listening dispatcher's readable() and writable() will never be called. ("man accept" and "man connect" for non-blocking sockets). * Fixed bug: error handling after accept(). It's said, that accept can return EWOULDBLOCK even for readable socket. This mean, that even after handle_accept(), dispatcher's accept() still raise EWOULDBLOCK. New code does accept() itself and stores accepted socket in self.__pending_accept. If there was socket.error, it's treated as EWOULDBLOCK. dispatcher's accept returns self.__pending_accept and resets it to None. Features: * Added pending_read() and pending_write(). The functions helps to use dispatcher over non socket objects with buffering capabilities. In original dispatcher, if socket makes buffered read and some data is in buffer, entering asyncore.poll() doesn't finishes, since there is no data in real file/socket. This feature allow to use SSL socket, since the socket reads data by 16k chunks. ---------------------------------------------------------------------- >Comment By: Alexey Klimkin (klimkin) Date: 2007-01-08 23:44 Message: Logged In: YES user_id=410460 Originator: YES 1) The patch was developed not during some academic research - but during of coding true non-blocking client-server applications, capable to run both on Linux and Windows. Original code had a lot of issues with everything: some parts were not truly blocking, not every socket can be passed, issues with high load, etc. 2) We have used medusa for ssl capability in our project. However, it's impossible to get fully non-blocking functionality with original asyncore and original medusa. So functionality was extended to support these features as well. That is what idispatcher for. 3) In the end we have got pretty reliable code, which supports features I described here and has tons of bug and issues fixed. Again, I didn't fix bug for any academic purpose - every fix was driven by real issue we met during development. I don't also think, that these fixes bond to our project too tight - I believe I made them pretty general. 4) It's possible, that some parts can be made better for other application. But if you follow the same path - developing truly non-blocking client-server with medusa's ssl capabilities, - I think, you will end with the same thing. 5) I don't insist on including the patch into the python tree as is. I feel pretty well using modified asyncore in my private library. My intention was to share my experience. Please use, if you need to. 6) The development I mention above was 2004 year. So the patch is not in sync with this reality for 2 years already. Some issues it was solving can be gone already. I also don't know, what is going on with SSL for python - there seems to be new libraries as well. ...so... just use it as you want... or as you don't want ;) ... ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 07:53 Message: Logged In: YES user_id=341410 Originator: NO In asynchat, the only stuff that should be accepted is the handle_read() changes. The deque removal should be ignored (we have deques since Python 2.4, which are *significantly* faster than lists in nontrivial applications), the iasync_chat stuff, like the idispatcher stuff, seems unnecessary. And that's pretty much it for asynchat. The proposed asynchttp module shouldn't go into the Python standard library until it has lived on its own for a nontrival amount of time in the Cheeseshop and is found to be as good as httplib, urllib, or urllib2. Even then, its inclusion should be questioned, as medusa (the http server based on asyncore) has been around for a decade or more, is used many places, and yet still isn't in the standard library. The asyncoreTest.py needs a bit of work (I notice some incorrect names), but could be used as an addition to the test suite (currently it seems as though only asynchat is tested). ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 07:42 Message: Logged In: YES user_id=341410 Originator: NO Many of the changes in the source provided by klimkin in his most recent revision from February 27, 2005 seek to solve certain problems in an inconsistent or incorrect way. Some of his changes (or variants thereof) are worthwhile. I'll start with my issues with his asyncore changes, then describe what I think should be added from them. For example, in his updated asyncore.py, the list of sockets is first shuffled randomly, then sorted based on priority. Assuming that one ignored priorities for a moment, if there were more sockets than the max sockets for the platform, then due to the limitations of randomness, there would be no guarantees that all sockets would get polled. Say, for example, that one were using windows and were running close to the actual select file handle limit (512 in Python 2.3) with 500 handles, you would skip 436 of the sockets *this pass*. In 10 passes, there would have been 100 sockets that were never polled. In 20 passes, there would still be, on average, 20 that were never polled. So this "randomization" step is the wrong thing to do, unless you actually make multiple select calls for each poll() call. But really, select is limited by 512, and I've run it with 500 without issue. The priority based sorting has much of the same problems, but it is even worse when you have nontrivial numbers of differing priorities, regardless of randomization or not. The max socket limit of 64 on Windows isn't correct. It's been 512 since at least Python 2.3 . And all other platforms being 65536? No. I've had some versions of linux die on me at 512, others at 4096, but all were dog slow beyond 500 or so. It's better to let the underlying system raise an exception for the user when it fails and let them attempt to tune it, rather than forcing a tuning that may not be correct. The "pending read" stuff is also misdirected. Assuming a non-broken async client or server, either should be handling content as it comes it, dispatching as necessary. See asynchat.collect_incoming_data() and asynchat.found_terminator() for examples. The idispatcher stuff seems unnecessary. Generally speaking, it seems to me that there are 3 levels of abstraction going on: 1) handle_*_event(), called by poll, poll2, etc. 2) handle_*(), called by handle_*_event(), user overrides, calls other handle_*() and *() methods 3) *() (aka recv, send, close, etc.), called by handle_*(), generally left alone. Some of your code breaks the abstraction and has items in layer 2 call items in layer 1, which then call items in layer 2 again. This seems unnecessary, and breaks the general downward calling semantic (except in the case of errors returned by layer 3 resulting in layer 2 handle_close() calls, which is the proper method to call). There are, according to my reading of the asyncore portions of your included module, a few things that may be worthy for inclusion into the Python standard library are: * A variant of your changes to close_all(), though it should proceed in closing everything unless a KeyboardInterrupt, SystemExit, or ExitNow exception is raised. Socket errors should be ignored, because we are closing them - we don't care about their error condition. * Checking sockets for socket error via socket.getsockopt() . * A variant of your .close() implementation. * The CONNRESET, etc., stuff in the send() and recv() methods, but not the handle_close_event() replacements, stick with handle_close() . * Checking for KeyboardInterrupt and SystemExit inside the poll functions. * The _closed_socket class and initialization. All but the last of the above, I would consider to be bugfixes, and if others agree that these are reasonable changes, I'll write up a patch against trunk and 2.5 maintenance. The last change, while I think would be nice, probably shouldn't be included in 2.5 maintenance, though I think would be fine for the trunk. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2005-02-27 00:39 Message: Logged In: YES user_id=410460 Minor improvements: * Added handle_close_event(): calls handle_close(), then closes channel. No need to write self.close() in each handle_close (). * Improved exception handling. KeyboardInterrupt is not blocked. For python exception handle_error_event() is called, which checks for KeyboardInterrupt and closes socket, if handle_error didn't. Bugs: * Calling connect() could raise exception and doesn't hit handle_error(). Now if there was an exception, handle_error_event() is called. Features: * set_timeout(): Sets timeout for dispatcher object, if there was no io for the object, raises ETIMEDOUT, which handled by handle_error_event(). * Fixed issue with Windows - too many descriptors in select(). The list of sockets shuffled and only first asyncore.max_channels used in select(). * Added set_prio(): Sets priority for dispatcher. After shuffle the list of sockets sorted by priority. You may also check asynhttplib - asynchronous version of httplib. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-07-02 17:44 Message: Logged In: YES user_id=410460 In addition to "[ 909005 ] asyncore fixes and improvements" and CVS version "asyncore.py,v 2.51" this patch provides: * Added handling of buffered socket layer (pending_read(), pending_write()). * Added fd number for __repr__. * Initialized self.socket = socket._closedsocket() instead of None for verbose error output (like closed socket.socket). * asyncore and asynchat implements idispatcher and iasync_chat. * Fixed self.addr initialization. * Removed import exceptions. * Don't filter KeyboardInterrupt, just pass through. * Added queue of sockets, solves the problem of select() on too many descriptors. I have run make test in python cvs distrib without problems. Examples of using i* included. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-06-05 21:54 Message: Logged In: YES user_id=11375 I've struggled to get the test suite running without errors on my machine, but have failed. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-03-22 09:15 Message: Logged In: YES user_id=410460 There is no real reason for this change, please undo. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 23:18 Message: Logged In: YES user_id=11375 In your version of file_dispatch.__init__, the .set_file() call is moved earlier; can you say why? ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 23:13 Message: Logged In: YES user_id=11375 Added "map" parameter for file_dispatcher and dispatcher_with_send in CVS HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 23:08 Message: Logged In: YES user_id=11375 Repeating exception changes ('raise socket.error' -> just 'raise') checked into HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 23:02 Message: Logged In: YES user_id=11375 Mac code for writable() removed from HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 23:02 Message: Logged In: YES user_id=11375 Patch to use True/False applied to HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 22:55 Message: Logged In: YES user_id=11375 Fix for bug #887279 applied to HEAD. ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2004-03-21 22:48 Message: Logged In: YES user_id=11375 The many number of changes in this patch make it difficult to figure out which changes fix which problem. I've created a new directory in CVS, nondist/sandbox/asyncore, that contains copies of the module with these patches applied, and will work on applying changes to the copy in dist/src. ---------------------------------------------------------------------- Comment By: Alexey Klimkin (klimkin) Date: 2004-03-17 10:15 Message: Logged In: YES user_id=410460 Sorry, unfortunately I have lost old patch file. I have atached new one. In addition to fixes, listed above, the patch includes: 1. Fix for operating on uninitialized socket. self.socket now initializes with _closed_socket(), so any operation throws EBADF. 2. Added class idispatcher - base class for dispatcher. The purpose of this class is to allow simple replacement of media(dispatcher interface) in classes, derived from dispatcher class. This is based on 'object'. I have also attached asynchat.diff - example for new-style dispatcher. Old asynchat works as well. ---------------------------------------------------------------------- Comment By: Wummel (calvin) Date: 2004-03-11 18:49 Message: Logged In: YES user_id=9205 There is no file attached! You have to click on the checkbox next to the upload filename. This is a Sourceforge annoyance :( ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=909005&group_id=5470 From noreply at sourceforge.net Mon Jan 8 23:55:17 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 14:55:17 -0800 Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing sys.stdout in sitecustomize Message-ID: Patches item #1630975, was opened at 2007-01-08 23:55 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Neal Norwitz (nnorwitz) Summary: Fix crash when replacing sys.stdout in sitecustomize Initial Comment: When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects. Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes. No idea how to test for this, from a unittest :P ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 From noreply at sourceforge.net Tue Jan 9 02:10:56 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 17:10:56 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Tue Jan 9 02:26:29 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 17:26:29 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Tue Jan 9 02:26:33 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 17:26:33 -0800 Subject: [Patches] [ python-Patches-1631035 ] SyntaxWarning for backquotes Message-ID: Patches item #1631035, was opened at 2007-01-09 12:26 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Thomas Wouters (twouters) Summary: SyntaxWarning for backquotes Initial Comment: The following patch (for 2.6) issues a SyntaxWarning for backquotes/backticks in source code. I had to add the filename to struct compiling in Python/ast.c - this seems like the neatest way to get the filename passed around. (see also the XXX before ast_error) Assigned to twouters, since it was his idea in the first place. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470 From noreply at sourceforge.net Tue Jan 9 07:29:21 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 22:29:21 -0800 Subject: [Patches] [ python-Patches-1631171 ] implement warnings module in C Message-ID: Patches item #1631171, was opened at 2007-01-08 22:29 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: implement warnings module in C Initial Comment: Re-implement the warnings module in C for speed and to reduce start up time. I don't remember the exact state of this patch. I'm sure it needs cleanup. IIRC the only thing missing feature-wise was processing command line arguments. Though I'm not entirely sure. It's been a while since I did it. I think I may have not used as many goto's in the code. I'm also thinking I didn't like it as the error handling was too complex. This definitely needs review. If anyone wants to finish this off, go for it. I'll probably return to it, but it won't be for a few weeks at the earliest. It would probably be good to make comments to remind me of what needs to be done. The new file should be Python/_warnings.c. I couldn't decide whether to put it under Python/ or Modules/. It seems some builtin modules are in both places. Maybe we should determine where the appropriate place is and move them all there. I couldn't figure out how to get svn to do a diff of a file that wasn't checked in. I think I filtered out all the unrelated changes. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470 From noreply at sourceforge.net Tue Jan 9 07:30:09 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 22:30:09 -0800 Subject: [Patches] [ python-Patches-1631171 ] implement warnings module in C Message-ID: Patches item #1631171, was opened at 2007-01-08 22:29 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Neal Norwitz (nnorwitz) Assigned to: Nobody/Anonymous (nobody) Summary: implement warnings module in C Initial Comment: Re-implement the warnings module in C for speed and to reduce start up time. I don't remember the exact state of this patch. I'm sure it needs cleanup. IIRC the only thing missing feature-wise was processing command line arguments. Though I'm not entirely sure. It's been a while since I did it. I think I may have not used as many goto's in the code. I'm also thinking I didn't like it as the error handling was too complex. This definitely needs review. If anyone wants to finish this off, go for it. I'll probably return to it, but it won't be for a few weeks at the earliest. It would probably be good to make comments to remind me of what needs to be done. The new file should be Python/_warnings.c. I couldn't decide whether to put it under Python/ or Modules/. It seems some builtin modules are in both places. Maybe we should determine where the appropriate place is and move them all there. I couldn't figure out how to get svn to do a diff of a file that wasn't checked in. I think I filtered out all the unrelated changes. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-08 22:30 Message: Logged In: YES user_id=33168 Originator: YES File Added: _warnings.c ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631171&group_id=5470 From noreply at sourceforge.net Tue Jan 9 07:55:01 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 08 Jan 2007 22:55:01 -0800 Subject: [Patches] [ python-Patches-1631035 ] SyntaxWarning for backquotes Message-ID: Patches item #1631035, was opened at 2007-01-09 12:26 Message generated for change (Comment added) made by anthonybaxter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Thomas Wouters (twouters) Summary: SyntaxWarning for backquotes Initial Comment: The following patch (for 2.6) issues a SyntaxWarning for backquotes/backticks in source code. I had to add the filename to struct compiling in Python/ast.c - this seems like the neatest way to get the filename passed around. (see also the XXX before ast_error) Assigned to twouters, since it was his idea in the first place. ---------------------------------------------------------------------- >Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-09 17:55 Message: Logged In: YES user_id=29957 Originator: YES And here's another one (on top of the last one) that also emits SyntaxWarnings for <> Not sure I like 'NOTEQUALSOLD' as a token name, but I couldn't think of anything better. This one should probably be only enabled when the Py3K warnings flag is on. File Added: notequals.diff ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631035&group_id=5470 From noreply at sourceforge.net Tue Jan 9 12:12:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 03:12:30 -0800 Subject: [Patches] [ python-Patches-1631394 ] sre module has misleading docs Message-ID: Patches item #1631394, was opened at 2007-01-09 11:12 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tom Lynn (tlynn) Assigned to: Nobody/Anonymous (nobody) Summary: sre module has misleading docs Initial Comment: >>> help(sre) ... "$" Matches the end of the string. ... \Z Matches only at the end of the string. ... M MULTILINE "^" matches the beginning of lines as well as the string. "$" matches the end of lines as well as the string. The docs for "$" are misleading - it actually matches in newline-specific ways which the module's built-in docs don't hint at. The MULTILINE docs don't clarify this. I'd also like to see "from sre import __doc__" added to the end of re.py; lack of "help(re)" is a bigger problem than having slightly wrong auto-generated docs for the re module itself. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470 From iuftsd at collyworld.freeserve.co.uk Tue Jan 9 14:32:57 2007 From: iuftsd at collyworld.freeserve.co.uk (Flossie E.Hull) Date: Tue, 9 Jan 2007 07:32:57 -0600 Subject: [Patches] Together, the startups in the space offer hundreds of pre-configured templates for automating IT process management right out of the box. Message-ID: <45A39989.5050509@gossipfrom.com> An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/patches/attachments/20070109/a4f6003f/attachment.htm -------------- next part -------------- A non-text attachment was scrubbed... Name: clap.gif Type: image/gif Size: 10176 bytes Desc: not available Url : http://mail.python.org/pipermail/patches/attachments/20070109/a4f6003f/attachment.gif From noreply at sourceforge.net Tue Jan 9 19:22:39 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 10:22:39 -0800 Subject: [Patches] [ python-Patches-1631394 ] sre module has misleading docs Message-ID: Patches item #1631394, was opened at 2007-01-09 12:12 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tom Lynn (tlynn) Assigned to: Nobody/Anonymous (nobody) Summary: sre module has misleading docs Initial Comment: >>> help(sre) ... "$" Matches the end of the string. ... \Z Matches only at the end of the string. ... M MULTILINE "^" matches the beginning of lines as well as the string. "$" matches the end of lines as well as the string. The docs for "$" are misleading - it actually matches in newline-specific ways which the module's built-in docs don't hint at. The MULTILINE docs don't clarify this. I'd also like to see "from sre import __doc__" added to the end of re.py; lack of "help(re)" is a bigger problem than having slightly wrong auto-generated docs for the re module itself. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-09 19:22 Message: Logged In: YES user_id=21627 Originator: NO Did you mean to include a patch? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631394&group_id=5470 From noreply at sourceforge.net Tue Jan 9 21:01:18 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 12:01:18 -0800 Subject: [Patches] [ python-Patches-1610795 ] BSD version of ctypes.util.find_library Message-ID: Patches item #1610795, was opened at 2006-12-07 14:29 Message generated for change (Comment added) made by theller You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Martin Kammerhofer (mkam) Assigned to: Thomas Heller (theller) Summary: BSD version of ctypes.util.find_library Initial Comment: The ctypes.util.find_library function for Posix systems is actually tailored for Linux systems. While the _findlib_gcc function relies only on the GNU compiler and may therefore work on any system with the "gcc" command in PATH, the _findLib_ld function relies on the /sbin/ldconfig command (originating from SunOS 4.0) which is not standardized. The version from GNU libc differs in option syntax and output format from other ldconfig programs around. I therefore provide a patch that enables find_library to properly communicate with the ldconfig program on FreeBSD systems. It has been tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD systems too. (It works without this patch on FreeBSD, because after getting an error from ldconfig it falls back to _findlib_gcc.) While at it I also tidied up the Linux specific code: I'm escaping the function argument before interpolating it into a regular expression (to protect against nasty regexps) and removed the code for creation of a temporary file that was not used in any way. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2007-01-09 21:01 Message: Logged In: YES user_id=11105 Originator: NO mkam, I was eventually able to test out your patch. I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9. The output from "print find_library('c'), find_library('m')" on these systems is as follows: FreeBSD6.0: libc.so.6, libm.so.4 NetBSD3.0: libc.so.12, libm.so.0 OpenBSD3.9: libc.so.39.0, libm.so.2.1 If you think this is what is expected, I'm happy to apply the patch. Or is there further work needed on it? (Do you still need the output of "ldconfig -r" or whatever?) ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 19:43 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-12 12:28 Message: Logged In: YES user_id=1656067 Originator: YES Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine, FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions on how many version numbers are appended to a library name any more. Even mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to different.) File Added: ctypes-util.py.patch ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-11 11:10 Message: Logged In: YES user_id=1656067 Originator: YES Hm, I did not know that OpenBSD is still using two version numbers for shared library. (I conclude that from the "libc.so.39.0" in the previous followup. Btw FreeBSD has used a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout executable format.) Unfortunately my freebsd patch has the assumption of a single version number built in; more specifically the cmp(* map(lambda x: int(x.split('.')[-1]), (a, b))) is supposed to sort based an the last dot separated field. I guess that OpenBSD system does not have another libc, at least none with a minor > 0. ;-) Thomas, can you mail me the output of "ldconfig -r"? I will refine the patch then, doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/ fields. Said output from NetBSD welcome too. DragonflyBSD should be no problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform like? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-08 21:32 Message: Logged In: YES user_id=11105 Originator: NO I have tested the patch on FreeBSD 6.0 and (after extending the check to test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works fine. find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0 in OpenBSD 3.9, while it returned 'None' before on both machines. ---------------------------------------------------------------------- Comment By: David Remahl (chmod007) Date: 2006-12-08 08:50 Message: Logged In: YES user_id=2135 Originator: NO # Does this work (without the gcc fallback) on other *BSD systems too? I don't know, but it doesn't work on Darwin (which already has a custom method through macholib). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-07 22:11 Message: Logged In: YES user_id=11105 Originator: NO Will do (although I would appreciate review from others too; I'm not exactly a BSD expert). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-07 20:15 Message: Logged In: YES user_id=21627 Originator: NO Thomas, can you take a look? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 From noreply at sourceforge.net Tue Jan 9 23:54:09 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 14:54:09 -0800 Subject: [Patches] [ python-Patches-1500611 ] (py3k) Remove the sets module Message-ID: Patches item #1500611, was opened at 2006-06-04 16:38 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: (py3k) Remove the sets module Initial Comment: This patch removes the sets module, its documentation and tests, in addition to replacing all usages of it with the built-in set type. The patch is against r46648. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-09 17:54 Message: Logged In: YES user_id=1344176 Originator: YES File Added: py3k-remove_sets_module.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2006-08-31 18:44 Message: Logged In: YES user_id=1344176 The patch has been updated to r51654. I'm not sure how well `svn diff` handles removed files, so you might have to `svn rm` Lib/sets.py, Lib/test/test_sets.py and Doc/lib/libsets.py manually. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-08-26 16:26 Message: Logged In: YES user_id=6380 This patch seems out of date -- can you refresh it? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470 From noreply at sourceforge.net Wed Jan 10 02:29:17 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 17:29:17 -0800 Subject: [Patches] [ python-Patches-1500611 ] (py3k) Remove the sets module Message-ID: Patches item #1500611, was opened at 2006-06-04 16:38 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 3000 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: (py3k) Remove the sets module Initial Comment: This patch removes the sets module, its documentation and tests, in addition to replacing all usages of it with the built-in set type. The patch is against r46648. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-09 20:29 Message: Logged In: YES user_id=6380 Originator: NO Checked in. Thanks! ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 17:54 Message: Logged In: YES user_id=1344176 Originator: YES File Added: py3k-remove_sets_module.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2006-08-31 18:44 Message: Logged In: YES user_id=1344176 The patch has been updated to r51654. I'm not sure how well `svn diff` handles removed files, so you might have to `svn rm` Lib/sets.py, Lib/test/test_sets.py and Doc/lib/libsets.py manually. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-08-26 16:26 Message: Logged In: YES user_id=6380 This patch seems out of date -- can you refresh it? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1500611&group_id=5470 From noreply at sourceforge.net Wed Jan 10 03:12:52 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 18:12:52 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From noreply at sourceforge.net Wed Jan 10 03:13:23 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 18:13:23 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: doc_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From noreply at sourceforge.net Wed Jan 10 03:13:53 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 18:13:53 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: fixup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: doc_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From noreply at sourceforge.net Wed Jan 10 03:14:58 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 18:14:58 -0800 Subject: [Patches] [ python-Patches-1630248 ] Implement named exception cleanup Message-ID: Patches item #1630248, was opened at 2007-01-07 22:02 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 >Status: Closed >Resolution: Duplicate Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Implement named exception cleanup Initial Comment: This patch implements the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. Specifically, try: ... except ExcType, e: #body is translated to try: ... except ExcType, e: try: # body finally: e = None del e The attached patches are against r53289. exc_cleanup.patch is the implementation and testcases, while stdlib_fixes.patch repairs all places in the stdlib that depended on the old behaviour. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:14 Message: Logged In: YES user_id=1344176 Originator: YES This patched has been superseded by patch #1631942. ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:51 Message: Logged In: YES user_id=1344176 Originator: YES Patches updated in reponse to PJE's comment (http://mail.python.org/pipermail/python-3000/2007-January/005430.html): """In the tuple or list case, there's no need to reset the variables, because then the traceback won't be present any more; the exception object will have been discarded after unpacking.""" ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:50 Message: Logged In: YES user_id=1344176 Originator: YES File Added: exc_cleanup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-08 11:49 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:34 Message: Logged In: YES user_id=1344176 Originator: YES This is the first time I've done this kind of surgery on the compiler, so any tips/tricks/advice would be greatly appreciated. ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-07 22:02 Message: Logged In: YES user_id=1344176 Originator: YES File Added: stdlib_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630248&group_id=5470 From noreply at sourceforge.net Wed Jan 10 04:41:43 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 19:41:43 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None >Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-09 22:41 Message: Logged In: YES user_id=6380 Originator: NO Reviewing... Seems the merge tfrom the 2.6 trunk that Thomas did made some changes to tarfile.py. Did you have to manually patch anything up after running 2to3/refactor.py -f except on the entire stdlib? patching file Lib/tarfile.py Hunk #1 succeeded at 1540 (offset 38 lines). Hunk #3 succeeded at 1573 (offset 38 lines). Hunk #5 succeeded at 1745 (offset 38 lines). Hunk #7 succeeded at 1786 (offset 38 lines). Hunk #9 FAILED at 1795. 1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: fixup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: doc_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From noreply at sourceforge.net Wed Jan 10 06:39:15 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 09 Jan 2007 21:39:15 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: Python 3000 Status: Open >Resolution: Accepted Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-10 00:39 Message: Logged In: YES user_id=6380 Originator: NO For some strange reason, test_exceptions was wrong. I'm guessing that the newly added test should be this: def testExceptionCleanup(self): # Make sure "except V as N" exceptions are cleaned up properly try: raise Exception() except Exception as e: self.failUnless(e) self.failIf('e' in locals()) (it had ', e' instead of 'as e', and there was an unneeded 'del e' after self.failUnless(e).) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-09 22:41 Message: Logged In: YES user_id=6380 Originator: NO Reviewing... Seems the merge tfrom the 2.6 trunk that Thomas did made some changes to tarfile.py. Did you have to manually patch anything up after running 2to3/refactor.py -f except on the entire stdlib? patching file Lib/tarfile.py Hunk #1 succeeded at 1540 (offset 38 lines). Hunk #3 succeeded at 1573 (offset 38 lines). Hunk #5 succeeded at 1745 (offset 38 lines). Hunk #7 succeeded at 1786 (offset 38 lines). Hunk #9 FAILED at 1795. 1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: fixup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: doc_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From noreply at sourceforge.net Wed Jan 10 12:58:00 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 03:58:00 -0800 Subject: [Patches] [ python-Patches-1610795 ] BSD version of ctypes.util.find_library Message-ID: Patches item #1610795, was opened at 2006-12-07 14:29 Message generated for change (Comment added) made by mkam You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Martin Kammerhofer (mkam) Assigned to: Thomas Heller (theller) Summary: BSD version of ctypes.util.find_library Initial Comment: The ctypes.util.find_library function for Posix systems is actually tailored for Linux systems. While the _findlib_gcc function relies only on the GNU compiler and may therefore work on any system with the "gcc" command in PATH, the _findLib_ld function relies on the /sbin/ldconfig command (originating from SunOS 4.0) which is not standardized. The version from GNU libc differs in option syntax and output format from other ldconfig programs around. I therefore provide a patch that enables find_library to properly communicate with the ldconfig program on FreeBSD systems. It has been tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD systems too. (It works without this patch on FreeBSD, because after getting an error from ldconfig it falls back to _findlib_gcc.) While at it I also tidied up the Linux specific code: I'm escaping the function argument before interpolating it into a regular expression (to protect against nasty regexps) and removed the code for creation of a temporary file that was not used in any way. ---------------------------------------------------------------------- >Comment By: Martin Kammerhofer (mkam) Date: 2007-01-10 12:58 Message: Logged In: YES user_id=1656067 Originator: YES The output looks good. The patch selects the numerically highest library version. NetBSD is not handled by the patch but works through _findLib_gcc (which will also be tried as a fallback strategy for Free/Open-BSD when ldconfig output parsing fails.) I think the patch is ready for commit. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-09 21:01 Message: Logged In: YES user_id=11105 Originator: NO mkam, I was eventually able to test out your patch. I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9. The output from "print find_library('c'), find_library('m')" on these systems is as follows: FreeBSD6.0: libc.so.6, libm.so.4 NetBSD3.0: libc.so.12, libm.so.0 OpenBSD3.9: libc.so.39.0, libm.so.2.1 If you think this is what is expected, I'm happy to apply the patch. Or is there further work needed on it? (Do you still need the output of "ldconfig -r" or whatever?) ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 19:43 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-12 12:28 Message: Logged In: YES user_id=1656067 Originator: YES Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine, FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions on how many version numbers are appended to a library name any more. Even mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to different.) File Added: ctypes-util.py.patch ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-11 11:10 Message: Logged In: YES user_id=1656067 Originator: YES Hm, I did not know that OpenBSD is still using two version numbers for shared library. (I conclude that from the "libc.so.39.0" in the previous followup. Btw FreeBSD has used a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout executable format.) Unfortunately my freebsd patch has the assumption of a single version number built in; more specifically the cmp(* map(lambda x: int(x.split('.')[-1]), (a, b))) is supposed to sort based an the last dot separated field. I guess that OpenBSD system does not have another libc, at least none with a minor > 0. ;-) Thomas, can you mail me the output of "ldconfig -r"? I will refine the patch then, doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/ fields. Said output from NetBSD welcome too. DragonflyBSD should be no problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform like? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-08 21:32 Message: Logged In: YES user_id=11105 Originator: NO I have tested the patch on FreeBSD 6.0 and (after extending the check to test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works fine. find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0 in OpenBSD 3.9, while it returned 'None' before on both machines. ---------------------------------------------------------------------- Comment By: David Remahl (chmod007) Date: 2006-12-08 08:50 Message: Logged In: YES user_id=2135 Originator: NO # Does this work (without the gcc fallback) on other *BSD systems too? I don't know, but it doesn't work on Darwin (which already has a custom method through macholib). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-07 22:11 Message: Logged In: YES user_id=11105 Originator: NO Will do (although I would appreciate review from others too; I'm not exactly a BSD expert). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-07 20:15 Message: Logged In: YES user_id=21627 Originator: NO Thomas, can you take a look? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 From noreply at sourceforge.net Wed Jan 10 15:40:24 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 06:40:24 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-10 09:40 Message: Logged In: YES user_id=1344176 Originator: YES I think there were only four files I had to patch manually after running 2to3; each used automatic exception unpacking. 2to3 successfully fixes Lib/tarfile.py (as of tarfile.py r53336, 2to3 r53339). The 'del e' in testExceptionCleanup() was indeed needed; it was there to verify that the transformation was except V as N: try: ... finally: N = None del N and not except V as N: try: ... finally: del N ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-10 00:39 Message: Logged In: YES user_id=6380 Originator: NO For some strange reason, test_exceptions was wrong. I'm guessing that the newly added test should be this: def testExceptionCleanup(self): # Make sure "except V as N" exceptions are cleaned up properly try: raise Exception() except Exception as e: self.failUnless(e) self.failIf('e' in locals()) (it had ', e' instead of 'as e', and there was an unneeded 'del e' after self.failUnless(e).) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-09 22:41 Message: Logged In: YES user_id=6380 Originator: NO Reviewing... Seems the merge tfrom the 2.6 trunk that Thomas did made some changes to tarfile.py. Did you have to manually patch anything up after running 2to3/refactor.py -f except on the entire stdlib? patching file Lib/tarfile.py Hunk #1 succeeded at 1540 (offset 38 lines). Hunk #3 succeeded at 1573 (offset 38 lines). Hunk #5 succeeded at 1745 (offset 38 lines). Hunk #7 succeeded at 1786 (offset 38 lines). Hunk #9 FAILED at 1795. 1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: fixup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: doc_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From noreply at sourceforge.net Wed Jan 10 17:23:41 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 08:23:41 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-10 11:23 Message: Logged In: YES user_id=6380 Originator: NO Thanks!! Submitted, with tarfile.py and test_exceptions.py corrected (kept the 'del e' in the latter). Committed revision 53342. Note: there is now a new test_hotshot failure, probably due to the different code generated for except clauses; I'm keeping this patch open for that. Is it time to drop the unpacking (sequence) behavior from exceptions altogether, per Brett's PEP? (That would be a new SF patch.) ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-10 09:40 Message: Logged In: YES user_id=1344176 Originator: YES I think there were only four files I had to patch manually after running 2to3; each used automatic exception unpacking. 2to3 successfully fixes Lib/tarfile.py (as of tarfile.py r53336, 2to3 r53339). The 'del e' in testExceptionCleanup() was indeed needed; it was there to verify that the transformation was except V as N: try: ... finally: N = None del N and not except V as N: try: ... finally: del N ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-10 00:39 Message: Logged In: YES user_id=6380 Originator: NO For some strange reason, test_exceptions was wrong. I'm guessing that the newly added test should be this: def testExceptionCleanup(self): # Make sure "except V as N" exceptions are cleaned up properly try: raise Exception() except Exception as e: self.failUnless(e) self.failIf('e' in locals()) (it had ', e' instead of 'as e', and there was an unneeded 'del e' after self.failUnless(e).) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-09 22:41 Message: Logged In: YES user_id=6380 Originator: NO Reviewing... Seems the merge tfrom the 2.6 trunk that Thomas did made some changes to tarfile.py. Did you have to manually patch anything up after running 2to3/refactor.py -f except on the entire stdlib? patching file Lib/tarfile.py Hunk #1 succeeded at 1540 (offset 38 lines). Hunk #3 succeeded at 1573 (offset 38 lines). Hunk #5 succeeded at 1745 (offset 38 lines). Hunk #7 succeeded at 1786 (offset 38 lines). Hunk #9 FAILED at 1795. 1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: fixup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: doc_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From qemqt at leogavazzi.it Wed Jan 10 18:49:32 2007 From: qemqt at leogavazzi.it (Watty Miller) Date: Wed, 10 Jan 2007 17:49:32 -0000 Subject: [Patches] Asbjorn Lonvig art works - Auctions Worldwide. Message-ID: <001501c70ccc$11c7ffd0$cbdc6b3f@fqyf> We heard the fireworks but there were too many buildings and trees between us and 1Utama where the party was at. Some do say that the flaws in a person is actually what makes that person perfect. I haven't dreamt in a long while, don't ask me why. Easily generate code using the DataBlock Modeler tool. com is the best online resource for Online Universities. The city however needs - in my opinion - something indicating Rome is amodern city. New:Drafts for Paper Cut-Outs exhibited I have been asked toconsidersmaller sizes - sizes under 1 meter. SpaceToNbsp: Converts all spaces to HTML non-breaking spaces. com makes it fast and easy to find rome high schoolRome High SchoolReunite With Old High School Friends, Alumni and Old Flames. Find roman historyHelpful Links for roman historyFind roman history at Netster. NET objects or making sure that a specific piece of code is running in a transaction. A lot has happened since my last post. org The RSS feeds are submitted to news feed directories. The new us that we worked an entire month at. Antique Rome Exclusive Selection by Asbjorn Lonvig. As a point of departure editions are open. The competition is judged solely by visualssubmitted online. In Italy, the Feast of the Epiphany is a big part of the Christmas holidays, celebrated. Although there are reports that certain combinations of browsers and Acrobat versions are not vulnerable, upgrading might be the easiest path to ensure vulnerability is gone. Download Data Dictionary Creator 1. StripTags: Removes all HTML tags from the passed string. Muslims who don't even have the basic needs of shelter nevermind food because they are refugees in their own country. com : rome guides - from haxwax-rome. Malaysia my beloved country had just celebrated its birthday. I have seen Via Appia and the Catacombs in Rome. Read Complete Article : Rome. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/patches/attachments/20070110/8c24796e/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 15693 bytes Desc: not available Url : http://mail.python.org/pipermail/patches/attachments/20070110/8c24796e/attachment.gif From noreply at sourceforge.net Wed Jan 10 19:24:04 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 10:24:04 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 01:37 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 10:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 17:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 17:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 10:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 02:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-06 21:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Wed Jan 10 21:00:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 12:00:30 -0800 Subject: [Patches] [ python-Patches-1631942 ] New exception syntax Message-ID: Patches item #1631942, was opened at 2007-01-09 21:12 Message generated for change (Comment added) made by collinwinter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: Python 3000 >Status: Closed Resolution: Accepted Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: New exception syntax Initial Comment: The attached patches implement the new "except V as N:" syntax and the solution outlined in http://mail.python.org/pipermail/python-3000/2007-January/005395.html for avoiding exception-related refcount cycles. new_exceptions.patch is the implementation and tests. fixup.patch adjusts the stdlib to use the new syntax. doc_fixes.patch fixes documentation and some docs-related utilities missed by Guido's 2to3 code. All patches are against r53289. ---------------------------------------------------------------------- >Comment By: Collin Winter (collinwinter) Date: 2007-01-10 15:00 Message: Logged In: YES user_id=1344176 Originator: YES The hotshot failure may have been related to magic number/bytecode differences, but since a "make clean" resolves the problem, the issue is considered closed (http://mail.python.org/pipermail/python-3000/2007-January/005501.html). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-10 11:23 Message: Logged In: YES user_id=6380 Originator: NO Thanks!! Submitted, with tarfile.py and test_exceptions.py corrected (kept the 'del e' in the latter). Committed revision 53342. Note: there is now a new test_hotshot failure, probably due to the different code generated for except clauses; I'm keeping this patch open for that. Is it time to drop the unpacking (sequence) behavior from exceptions altogether, per Brett's PEP? (That would be a new SF patch.) ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-10 09:40 Message: Logged In: YES user_id=1344176 Originator: YES I think there were only four files I had to patch manually after running 2to3; each used automatic exception unpacking. 2to3 successfully fixes Lib/tarfile.py (as of tarfile.py r53336, 2to3 r53339). The 'del e' in testExceptionCleanup() was indeed needed; it was there to verify that the transformation was except V as N: try: ... finally: N = None del N and not except V as N: try: ... finally: del N ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-10 00:39 Message: Logged In: YES user_id=6380 Originator: NO For some strange reason, test_exceptions was wrong. I'm guessing that the newly added test should be this: def testExceptionCleanup(self): # Make sure "except V as N" exceptions are cleaned up properly try: raise Exception() except Exception as e: self.failUnless(e) self.failIf('e' in locals()) (it had ', e' instead of 'as e', and there was an unneeded 'del e' after self.failUnless(e).) ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-09 22:41 Message: Logged In: YES user_id=6380 Originator: NO Reviewing... Seems the merge tfrom the 2.6 trunk that Thomas did made some changes to tarfile.py. Did you have to manually patch anything up after running 2to3/refactor.py -f except on the entire stdlib? patching file Lib/tarfile.py Hunk #1 succeeded at 1540 (offset 38 lines). Hunk #3 succeeded at 1573 (offset 38 lines). Hunk #5 succeeded at 1745 (offset 38 lines). Hunk #7 succeeded at 1786 (offset 38 lines). Hunk #9 FAILED at 1795. 1 out of 9 hunks FAILED -- saving rejects to file Lib/tarfile.py.rej ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: fixup.patch ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2007-01-09 21:13 Message: Logged In: YES user_id=1344176 Originator: YES File Added: doc_fixes.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1631942&group_id=5470 From noreply at sourceforge.net Wed Jan 10 21:24:51 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 12:24:51 -0800 Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at all Message-ID: Patches item #1627052, was opened at 2007-01-03 15:21 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Georg Brandl (gbrandl) Summary: backticks will not be used at all Initial Comment: In python 3, backticks will not mean repr. Every few months, someone suggests a new meaning for them. This clarifies that they won't be reused at all. ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-10 20:24 Message: Logged In: YES user_id=849994 Originator: NO Committed as rev. 53359. Had to fix the markup a bit. Can anyone tell me how to include a lone backtick in a ReST `` `` block? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-03 15:22 Message: Logged In: YES user_id=764593 Originator: YES Assigning to PEP owner, Georg. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 From noreply at sourceforge.net Wed Jan 10 21:30:36 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 12:30:36 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Wed Jan 10 21:59:14 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 12:59:14 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 10:37 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 21:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 21:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 19:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 02:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 02:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 19:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 11:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 06:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Thu Jan 11 01:13:03 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 16:13:03 -0800 Subject: [Patches] [ python-Patches-1627052 ] backticks will not be used at all Message-ID: Patches item #1627052, was opened at 2007-01-03 10:21 Message generated for change (Comment added) made by goodger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 3000 Status: Closed Resolution: Accepted Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Georg Brandl (gbrandl) Summary: backticks will not be used at all Initial Comment: In python 3, backticks will not mean repr. Every few months, someone suggests a new meaning for them. This clarifies that they won't be reused at all. ---------------------------------------------------------------------- >Comment By: David Goodger (goodger) Date: 2007-01-10 19:13 Message: Logged In: YES user_id=7733 Originator: NO Just do it: "`````". The meaning tends to get lost in the noise though. What you did is fine, but you don't need the backslash-escape. reST is smart enough to realize that (`) is ` in parentheses. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2007-01-10 15:24 Message: Logged In: YES user_id=849994 Originator: NO Committed as rev. 53359. Had to fix the markup a bit. Can anyone tell me how to include a lone backtick in a ReST `` `` block? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-03 10:22 Message: Logged In: YES user_id=764593 Originator: YES Assigning to PEP owner, Georg. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627052&group_id=5470 From noreply at sourceforge.net Thu Jan 11 02:20:27 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 17:20:27 -0800 Subject: [Patches] [ python-Patches-1615701 ] Creating dicts for dict subclasses Message-ID: Patches item #1615701, was opened at 2006-12-14 08:08 Message generated for change (Settings changed) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Walter D?rwald (doerwalter) >Assigned to: Raymond Hettinger (rhettinger) Summary: Creating dicts for dict subclasses Initial Comment: This patch changes dictobject.c so that creating dicts from mapping like objects only uses the internal dict functions if the argument is a *real* dict, not a subclass. This means that overwritten keys() and __getitem__() methods are now honored. In addition to that the fallback implementation now tries iterkeys() before trying keys(). It also adds a PyMapping_IterKeys() macro. ---------------------------------------------------------------------- Comment By: Walter D?rwald (doerwalter) Date: 2006-12-20 07:59 Message: Logged In: YES user_id=89016 Originator: YES To clear up some apparent misunderstandings: This patch does *not* advocate that some dict methods should be implemented by calling other dict methods so that dict subclasses only have to overwrite a few methods to gain a completely consistent implementation. This patch only fixes the dict constructor (and update()) and consists of two parts: (1) There are three code paths in dictobject.c::dict_update_common(): (a) if the constructor argument doesn't have a "keys" attribute treat it as a iterable of items; (b) if the argument has a "keys" attribute, but is not a dict (and not an instance of a subclass of dict), use keys() and __getitem__() to make a copy of the mapping-like object. (c) if the argument has a "keys" attribute and is a dict (or an instance of a subclass of dict) bypass any of the overwritten methods that the object might provide and directly use the dict implementation. This patch changes PyDict_Merge() so that code path (b) is used for dict constructor arguments that are subclasses of dict, so that any overwritten methods are honored. (2) This means that now if a subclass of dict is passed to the constructor or update() the code is IMHO more correct (it honors the reimplemenation of the mapping methods), but slower. To reduce the slowdown instead of using kesY() and __getitem__(), iterkeys() and __getitem__() are used. I can't see why the current behaviour should be better: Yes, it is faster, but it is also wrong: Without the patch the behaviour of dict() and dict.update() depends on the fact whether the argument happens to subclass dict or not. If it doesn't all is well: the argument is treated as a mapping (i.e. keys() and __getitem__() are used) otherwise the methods are completely ignored. So can we agree on the fact that (1) is desirable? (At least Guido said as much: http://mail.python.org/pipermail/python-dev/2006-December/070341.html) BTW, I only added PyMapping_Iterkeys() because it mirrors PyMapping_Keys(). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-12-19 19:13 Message: Logged In: YES user_id=80475 Originator: NO Since update already supports (key, item) changes, I do not see that rationale in trying to expand the definition what is dict-like to include a try-this, then try-that approach. This is a little too ad-hoc for too little benefit. Also, I do not see the point of adding PyMapping_Iterkeys to the C API. It affords no advantage over its macro definition (the current one-way-to-do-it). ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-12-19 18:00 Message: Logged In: YES user_id=80475 Originator: NO It is also asking for bugs if someone hooks __getitem__ and starts to make possibly invalid assumptions about what other changes occur implicitly. Also, this patch kills the performance of builtin subclasses. If I subclass dict to add a new method, it would suck to have the performance of all of the other methods drop percariously. This patch is somewhat overzealous. It encroaches on the terriority of UserDict.DictMixin which was specifically made for propagating new behaviors. It unnecessarily exposes implementation details. It introduces implicit behaviors that should be done through explicit overrides of methods whose behavior is supposed to change. And, it is on the top of a slippery slope that we don't want to go down (i.e. do we want to guarantee that list.append is implemented in terms of list.extend, etc). Python has no shortage of places where builtin subclasses make direct calls to the underlying C code -- this patch leads to a bottomless pit of changes that kill performance and make implicit side-effects the norm instead of the exception. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-19 17:29 Message: Logged In: YES user_id=764593 Originator: NO FWIW, I'm not sure I agree on not specifying which methods call share implementation. If someone hooks __getitem__ but not get, that is just asking for bugs. (The implementation of get may -- but need not -- make its own call to __getitem__, and not everyone will make the same decision.) ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-19 17:26 Message: Logged In: YES user_id=764593 Originator: NO As I understand it, the problem is that dict.update is assuming any dict subclass will use the same internal data representation. Restricting the fast path to exactly builtin dicts (not subclasses) fixes the bug, but makes the fallback more frequent. The existing fallback is to call keys(), then iterate over it, retrieving the value for each key. (keys is required for a "minimal mapping" as documented is UserDict, and a few other random places.) The only potential dependency on other methods is his proposed new intermediate path that avoids creating a list of all keys, by using iterkeys if it exists. (I suggested using iteritems to avoid the lookups.) If iter* aren't implemented, the only harm is falling back to the existing fallback of "for k in keys():" ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2006-12-19 16:07 Message: Logged In: YES user_id=80475 Originator: NO I'm -1 on making ANY guarantees about which methods underlie others -- that would constitute new and everlasting guarantees about how mappings are implemented. Subclasses should explicity override/extend the methods withed changed behavior. If that proves non-trivial, then it is likely there should be a has-a relationship instead of an is-a relationship. Also, it is likely that the subclass will have Liskov substitutability violations. Either way, there is probably a design flaw. ---------------------------------------------------------------------- Comment By: Walter D?rwald (doerwalter) Date: 2006-12-19 14:23 Message: Logged In: YES user_id=89016 Originator: YES iteritems() has to create a new tuple for each item, so this might be slower. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-19 12:50 Message: Logged In: YES user_id=764593 Originator: NO Why are you using iterkeys instead of iteritems? It seems like if they've filled out the interface enough to have iterkeys, they've probably filled it out all the way, and you do need the value as soon as you get the key. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615701&group_id=5470 From noreply at sourceforge.net Thu Jan 11 08:55:58 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 10 Jan 2007 23:55:58 -0800 Subject: [Patches] [ python-Patches-1630118 ] Patch to add tempfile.SpooledTemporaryFile (for #415692) Message-ID: Patches item #1630118, was opened at 2007-01-07 20:36 Message generated for change (Comment added) made by arigo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Dustin J. Mitchell (djmitche) Assigned to: Nobody/Anonymous (nobody) Summary: Patch to add tempfile.SpooledTemporaryFile (for #415692) Initial Comment: Attached please find a patch that adds a SpooledTemporaryFile class to tempfile, along with the corresponding documentation (optimistically labeling the feature as added in Python 2.5) and some test cases. ---------------------------------------------------------------------- >Comment By: Armin Rigo (arigo) Date: 2007-01-11 07:55 Message: Logged In: YES user_id=4771 Originator: NO Reimplementing the whole file interface as a proxy functions might be the safest route, yes. I realized that the __getattr__() magic is also used to serve at least one special method, namely the __iter__() of the file objects. This only works with old-style classes. In the long-term future, when old-style classes disappear and these classes become new-style, this is likely to introduce a subtle bug. ---------------------------------------------------------------------- Comment By: Dustin J. Mitchell (djmitche) Date: 2007-01-08 15:53 Message: Logged In: YES user_id=7446 Originator: YES I agree it would break in such a situation, but I'm not clear on which direction your bias leads you (specifically, which do we get right -- don't use bound methods, or don't use the __getattr__ magic?). I could fix this by defining "proxy" functions (and some properties) for the whole file interface, rather than just the methods that potentially trigger rollover. That would lose a little efficiency, but mostly only in reading (calling e.g., f.read() will always result in two function applications; in the current model, after the first call it runs at "native" speed). It would also lose forward compatibility if the file protocol changes, although I'm not sure how likely that is. Would you like me to do that? ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2007-01-08 08:26 Message: Logged In: YES user_id=4771 Originator: NO The __getattr__ magic makes the following kind of code fail with SpooledTemporaryFile: f = SpooledTemporaryFile(max_size=something) rd = f.read wr = f.write for x in y: ...use rd(size) and wr(data)... The problem is that the captured 'f.read' method is the one from the StringIO instance, even after the write() rolled the file over to disk. Given that capturing bound methods is a semi-official speed hack advertised in some respected places, we might have to be careful about it. About such matters I am biased towards first getting it right and then getting it fast... Also, Python 2.5 is already out, so this will probably be a 2.6 addition. ---------------------------------------------------------------------- Comment By: Dustin J. Mitchell (djmitche) Date: 2007-01-07 20:37 Message: Logged In: YES user_id=7446 Originator: YES File Added: SpooledTemporaryFile.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630118&group_id=5470 From noreply at sourceforge.net Thu Jan 11 22:50:27 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 13:50:27 -0800 Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile patch Message-ID: Patches item #1598415, was opened at 2006-11-17 15:44 Message generated for change (Comment added) made by vsajip You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.5 Status: Open Resolution: Invalid Priority: 5 Private: No Submitted By: chads (cjschr) Assigned to: Vinay Sajip (vsajip) Summary: Logging Module - followfile patch Initial Comment: Pertaining to the FileHandler and the file being written to: It's possible that the file being written to will be rolled-over by an external application such as newsyslog. By default, FileHandler tracks the file descriptor, not the file. If the original file is renamed, the file descriptor is still updated; however, it's probably desired that continued updates to the original file take place instead. This patch adds an attribute to the FileHandler class constructor (and basicConfig kw as well). If the attribute evaluates to True, the filename, not the descriptor is tracked. Basically, the code compares the file status from a previous emit call to the current call before the base class emit is called. If a difference in st_ino or st_dev is found, the current stream is flush/closed and a new one, based on baseFilename, is created, file status is updated, and then the base class emit is called. ---------------------------------------------------------------------- >Comment By: Vinay Sajip (vsajip) Date: 2007-01-11 21:50 Message: Logged In: YES user_id=308438 Originator: NO I've had a bit more of a think about this, and realised that I made a boo-boo in one of my earlier comments. Under Windows, log files are opened with exclusive locks, so that other processes cannot rename or move files which are open. So I believe the approach won't work at all under Windows. (Chad, sorry about making you redo the patch with ST_SIZE rather than ST_DEV and ST_INO). I also think this is a less common use case than warrants supporting it at the basicConfig() level, which is for really very basic usage configuration. So I would advocate adding a WatchedFileHandler (in logging.handlers) which watches st_dev and st_ino (as per Chad's original patch) and closes the old file descriptor and reopens the file when a change is seen. Some recent changes checked into SVN trunk facilitate the reopening - I've added an _open() method to FileHandler to do this. Chad, what do you think of this approach? ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 17:06 Message: Logged In: YES user_id=1093928 Originator: YES Uploaded the wrong diff. This is the correct one. ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 17:02 Message: Logged In: YES user_id=1093928 Originator: YES Updated per vsajip to work on Windoze too. The code now checks for a current size < previous size (based on ST_SIZE). ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2006-11-19 20:32 Message: Logged In: YES user_id=308438 Originator: NO This patch, relying as it does on Unix-specific details such as i-nodes, does not appear as if it will work under Windows. For that reason I will mark it as Pending and Invalid for now, if cjschr can update this tracker item with how the patch will work on Windows, I will look at it further. The SF system will automatically close it if no update is made to the item in approx. 2 weeks, though it can still be reopened after that. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-11-18 19:14 Message: Logged In: YES user_id=849994 Originator: NO Assigning to Vinay. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 From noreply at sourceforge.net Fri Jan 12 03:42:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 18:42:11 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Fri Jan 12 03:50:13 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 18:50:13 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Fri Jan 12 04:12:28 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 19:12:28 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Fri Jan 12 05:25:49 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 20:25:49 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Fri Jan 12 05:32:06 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 20:32:06 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Fri Jan 12 07:55:34 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 22:55:34 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 01:37 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-11 22:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 20:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 20:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 19:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 18:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 18:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 12:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 12:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 10:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 17:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 17:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 10:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 02:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-06 21:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Fri Jan 12 08:13:37 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 23:13:37 -0800 Subject: [Patches] [ python-Patches-1633807 ] from __future__ import print_function Message-ID: Patches item #1633807, was opened at 2007-01-12 18:13 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Nobody/Anonymous (nobody) Summary: from __future__ import print_function Initial Comment: This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!) The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0 Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 From noreply at sourceforge.net Fri Jan 12 08:31:26 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 11 Jan 2007 23:31:26 -0800 Subject: [Patches] [ python-Patches-1633807 ] from __future__ import print_function Message-ID: Patches item #1633807, was opened at 2007-01-12 18:13 Message generated for change (Comment added) made by anthonybaxter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Nobody/Anonymous (nobody) Summary: from __future__ import print_function Initial Comment: This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!) The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0 Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. ---------------------------------------------------------------------- >Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-12 18:31 Message: Logged In: YES user_id=29957 Originator: YES Updated version of patch - fixes interactive mode, adds builtins.print File Added: print_function.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 From noreply at sourceforge.net Fri Jan 12 18:57:07 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 12 Jan 2007 09:57:07 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Fri Jan 12 21:11:13 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 12 Jan 2007 12:11:13 -0800 Subject: [Patches] [ python-Patches-1610795 ] BSD version of ctypes.util.find_library Message-ID: Patches item #1610795, was opened at 2006-12-07 14:29 Message generated for change (Comment added) made by theller You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None >Priority: 9 Private: No Submitted By: Martin Kammerhofer (mkam) >Assigned to: Neal Norwitz (nnorwitz) Summary: BSD version of ctypes.util.find_library Initial Comment: The ctypes.util.find_library function for Posix systems is actually tailored for Linux systems. While the _findlib_gcc function relies only on the GNU compiler and may therefore work on any system with the "gcc" command in PATH, the _findLib_ld function relies on the /sbin/ldconfig command (originating from SunOS 4.0) which is not standardized. The version from GNU libc differs in option syntax and output format from other ldconfig programs around. I therefore provide a patch that enables find_library to properly communicate with the ldconfig program on FreeBSD systems. It has been tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD systems too. (It works without this patch on FreeBSD, because after getting an error from ldconfig it falls back to _findlib_gcc.) While at it I also tidied up the Linux specific code: I'm escaping the function argument before interpolating it into a regular expression (to protect against nasty regexps) and removed the code for creation of a temporary file that was not used in any way. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2007-01-12 21:11 Message: Logged In: YES user_id=11105 Originator: NO Neal, I think this can go into the release25-maint branch since it repairs the ctypes.util.find_library function on BSD systems. What do you think? ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2007-01-10 12:58 Message: Logged In: YES user_id=1656067 Originator: YES The output looks good. The patch selects the numerically highest library version. NetBSD is not handled by the patch but works through _findLib_gcc (which will also be tried as a fallback strategy for Free/Open-BSD when ldconfig output parsing fails.) I think the patch is ready for commit. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-09 21:01 Message: Logged In: YES user_id=11105 Originator: NO mkam, I was eventually able to test out your patch. I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9. The output from "print find_library('c'), find_library('m')" on these systems is as follows: FreeBSD6.0: libc.so.6, libm.so.4 NetBSD3.0: libc.so.12, libm.so.0 OpenBSD3.9: libc.so.39.0, libm.so.2.1 If you think this is what is expected, I'm happy to apply the patch. Or is there further work needed on it? (Do you still need the output of "ldconfig -r" or whatever?) ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 19:43 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-12 12:28 Message: Logged In: YES user_id=1656067 Originator: YES Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine, FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions on how many version numbers are appended to a library name any more. Even mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to different.) File Added: ctypes-util.py.patch ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-11 11:10 Message: Logged In: YES user_id=1656067 Originator: YES Hm, I did not know that OpenBSD is still using two version numbers for shared library. (I conclude that from the "libc.so.39.0" in the previous followup. Btw FreeBSD has used a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout executable format.) Unfortunately my freebsd patch has the assumption of a single version number built in; more specifically the cmp(* map(lambda x: int(x.split('.')[-1]), (a, b))) is supposed to sort based an the last dot separated field. I guess that OpenBSD system does not have another libc, at least none with a minor > 0. ;-) Thomas, can you mail me the output of "ldconfig -r"? I will refine the patch then, doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/ fields. Said output from NetBSD welcome too. DragonflyBSD should be no problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform like? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-08 21:32 Message: Logged In: YES user_id=11105 Originator: NO I have tested the patch on FreeBSD 6.0 and (after extending the check to test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works fine. find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0 in OpenBSD 3.9, while it returned 'None' before on both machines. ---------------------------------------------------------------------- Comment By: David Remahl (chmod007) Date: 2006-12-08 08:50 Message: Logged In: YES user_id=2135 Originator: NO # Does this work (without the gcc fallback) on other *BSD systems too? I don't know, but it doesn't work on Darwin (which already has a custom method through macholib). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-07 22:11 Message: Logged In: YES user_id=11105 Originator: NO Will do (although I would appreciate review from others too; I'm not exactly a BSD expert). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-07 20:15 Message: Logged In: YES user_id=21627 Originator: NO Thomas, can you take a look? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 From noreply at sourceforge.net Fri Jan 12 21:21:28 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 12 Jan 2007 12:21:28 -0800 Subject: [Patches] [ python-Patches-1610795 ] BSD version of ctypes.util.find_library Message-ID: Patches item #1610795, was opened at 2006-12-07 14:29 Message generated for change (Comment added) made by theller You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: Martin Kammerhofer (mkam) Assigned to: Neal Norwitz (nnorwitz) Summary: BSD version of ctypes.util.find_library Initial Comment: The ctypes.util.find_library function for Posix systems is actually tailored for Linux systems. While the _findlib_gcc function relies only on the GNU compiler and may therefore work on any system with the "gcc" command in PATH, the _findLib_ld function relies on the /sbin/ldconfig command (originating from SunOS 4.0) which is not standardized. The version from GNU libc differs in option syntax and output format from other ldconfig programs around. I therefore provide a patch that enables find_library to properly communicate with the ldconfig program on FreeBSD systems. It has been tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD systems too. (It works without this patch on FreeBSD, because after getting an error from ldconfig it falls back to _findlib_gcc.) While at it I also tidied up the Linux specific code: I'm escaping the function argument before interpolating it into a regular expression (to protect against nasty regexps) and removed the code for creation of a temporary file that was not used in any way. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2007-01-12 21:21 Message: Logged In: YES user_id=11105 Originator: NO Committed into trunk as revision 53402. Thanks for the patch and the work on it. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-12 21:11 Message: Logged In: YES user_id=11105 Originator: NO Neal, I think this can go into the release25-maint branch since it repairs the ctypes.util.find_library function on BSD systems. What do you think? ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2007-01-10 12:58 Message: Logged In: YES user_id=1656067 Originator: YES The output looks good. The patch selects the numerically highest library version. NetBSD is not handled by the patch but works through _findLib_gcc (which will also be tried as a fallback strategy for Free/Open-BSD when ldconfig output parsing fails.) I think the patch is ready for commit. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-09 21:01 Message: Logged In: YES user_id=11105 Originator: NO mkam, I was eventually able to test out your patch. I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9. The output from "print find_library('c'), find_library('m')" on these systems is as follows: FreeBSD6.0: libc.so.6, libm.so.4 NetBSD3.0: libc.so.12, libm.so.0 OpenBSD3.9: libc.so.39.0, libm.so.2.1 If you think this is what is expected, I'm happy to apply the patch. Or is there further work needed on it? (Do you still need the output of "ldconfig -r" or whatever?) ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 19:43 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-12 12:28 Message: Logged In: YES user_id=1656067 Originator: YES Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine, FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions on how many version numbers are appended to a library name any more. Even mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to different.) File Added: ctypes-util.py.patch ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-11 11:10 Message: Logged In: YES user_id=1656067 Originator: YES Hm, I did not know that OpenBSD is still using two version numbers for shared library. (I conclude that from the "libc.so.39.0" in the previous followup. Btw FreeBSD has used a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout executable format.) Unfortunately my freebsd patch has the assumption of a single version number built in; more specifically the cmp(* map(lambda x: int(x.split('.')[-1]), (a, b))) is supposed to sort based an the last dot separated field. I guess that OpenBSD system does not have another libc, at least none with a minor > 0. ;-) Thomas, can you mail me the output of "ldconfig -r"? I will refine the patch then, doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/ fields. Said output from NetBSD welcome too. DragonflyBSD should be no problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform like? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-08 21:32 Message: Logged In: YES user_id=11105 Originator: NO I have tested the patch on FreeBSD 6.0 and (after extending the check to test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works fine. find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0 in OpenBSD 3.9, while it returned 'None' before on both machines. ---------------------------------------------------------------------- Comment By: David Remahl (chmod007) Date: 2006-12-08 08:50 Message: Logged In: YES user_id=2135 Originator: NO # Does this work (without the gcc fallback) on other *BSD systems too? I don't know, but it doesn't work on Darwin (which already has a custom method through macholib). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-07 22:11 Message: Logged In: YES user_id=11105 Originator: NO Will do (although I would appreciate review from others too; I'm not exactly a BSD expert). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-07 20:15 Message: Logged In: YES user_id=21627 Originator: NO Thomas, can you take a look? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 From noreply at sourceforge.net Fri Jan 12 21:48:54 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 12 Jan 2007 12:48:54 -0800 Subject: [Patches] [ python-Patches-1617699 ] slice-object support for ctypes Pointer/Array Message-ID: Patches item #1617699, was opened at 2006-12-18 05:28 Message generated for change (Comment added) made by theller You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Thomas Heller (theller) Summary: slice-object support for ctypes Pointer/Array Initial Comment: Support for slicing ctypes' Pointer and Array types with slice objects, although only for step=1 case. (Backported from p3yk-noslice branch.) ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2007-01-12 21:48 Message: Logged In: YES user_id=11105 Originator: NO Thomas, a question: Since steps != 1 are not supported, does this patch have any value? IIUC, array[x:y] returns exactly the same as array[x:y:1] for all x and y values. Formally, the patch is missing unittests and documentation ;-). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 19:45 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470 From noreply at sourceforge.net Sat Jan 13 01:03:22 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 12 Jan 2007 16:03:22 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-13 00:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sat Jan 13 03:32:45 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 12 Jan 2007 18:32:45 -0800 Subject: [Patches] [ python-Patches-1634499 ] Py3k: Fix pybench so it runs Message-ID: Patches item #1634499, was opened at 2007-01-13 02:32 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tests Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Py3k: Fix pybench so it runs Initial Comment: This patch fixes pybench so it runs under the current Py3k trunk. I don't claim to have done the right thing, or even that my patch should be accepted. I submit it only in the hope that it's useful to somebody. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470 From noreply at sourceforge.net Sat Jan 13 15:39:48 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 06:39:48 -0800 Subject: [Patches] [ python-Patches-1563842 ] platform.py support for IronPython Message-ID: Patches item #1563842, was opened at 2006-09-23 03:59 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: M.-A. Lemburg (lemburg) Summary: platform.py support for IronPython Initial Comment: The following patch supplies minimal support for IronPython in platform.py - it makes the sys.version parsing not choke and die. There's a bunch of missing information from IronPython's sys.version string, not much that can be done there. Should platform.py grow an 'implementation' option, so it can detect whether it's IronPython, CPython, Jython, or something else? Patch is against svn trunk. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-13 15:39 Message: Logged In: YES user_id=38388 Originator: NO sanxiyn: What do the extra numbers after the 1.0 stand for ? Do those correspond to branch and revision ? Armin: I'll add support for sys.version_info and sys.subversion as well. ---------------------------------------------------------------------- Comment By: Seo Sanghyeon (sanxiyn) Date: 2006-10-10 04:35 Message: Logged In: YES user_id=837148 The current patch doesn't parse sys.version from IronPython 1.0.1. IronPython 1.0 gives: IronPython 1.0.60816 on .NET 2.0.50727.42 IronPython 1.0.1 gives: IronPython 1.0 (1.0.61005.1977) on .NET 2.0.50727.42 ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2006-10-10 00:48 Message: Logged In: YES user_id=4771 Python2.5 has grown a sys.subversion attribute: ('CPython', 'trunk', '51999') The first field is intended to describe the exact implementation of Python. platform.py could return this if it is available. It should also probably try to use sys.version_info instead of, or in addition to, using a regexp on sys.version. One can hope that in the long term the version_info and the subversion attributes should eventually be supported by all Python implementation (PyPy...). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-09-25 12:30 Message: Logged In: YES user_id=38388 Thanks. I'll install IronPython and see what else needs to be done. I've already added a few fixes to make Jython play nice with platform.py that I'll check in as well. And yes: I'll add a python_implementation() function that returns 'CPython', 'Jython' and 'IronPython' as appropriate. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470 From noreply at sourceforge.net Sat Jan 13 18:39:40 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 09:39:40 -0800 Subject: [Patches] [ python-Patches-1634778 ] Add aliases for latin7/9/10 charsets Message-ID: Patches item #1634778, was opened at 2007-01-13 18:39 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Christoph Zwerschke (cito) Assigned to: Nobody/Anonymous (nobody) Summary: Add aliases for latin7/9/10 charsets Initial Comment: This patch adds the latin-7, latin-9 and latin-10 aliases in some places where they were missing (see http://mail.python.org/pipermail/python-list/2006-December/416921.html). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470 From noreply at sourceforge.net Sat Jan 13 21:50:28 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 12:50:28 -0800 Subject: [Patches] [ python-Patches-1352731 ] Small upgrades to platform.platform() Message-ID: Patches item #1352731, was opened at 2005-11-10 02:19 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1352731&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: daishi harada (daishiharada) Assigned to: M.-A. Lemburg (lemburg) Summary: Small upgrades to platform.platform() Initial Comment: This patch updates platform.platform() to recognize some more Linux distributions. In addition, for RedHat-like distributions, will use the contents of the /etc/ to determine distname. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-13 21:50 Message: Logged In: YES user_id=38388 Originator: NO I'll add a new API linux_distribution() which will provide the more detailed information and also add support for Rocks (as well as a few others). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-11-09 08:49 Message: Logged In: YES user_id=38388 I'm currently working on an updated version of platform.py that will include part of this patch, patch #1563842 for IronPython and better support for Jython. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-11-09 06:21 Message: Logged In: YES user_id=21627 Marc-Andre, would you rather accept or reject this patch, because of the incompatibility? ---------------------------------------------------------------------- Comment By: daishi harada (daishiharada) Date: 2006-10-10 01:22 Message: Logged In: YES user_id=493197 Thanks for the response. If by "break" you mean that for redhat-like distros the output of `python platform.py` would no longer necessarily be the same after the patch is applied, yes, that's true. However, that was the primary motivation for the patch - the current platform.py wasn't sufficiently discriminating for my purposes. In particular, the current platform.py ignores the first "field" of the contents of /etc/redhat-release, which I believe for ROCKS was the only portion which was changed from the redhat version on which it was based. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-10-09 20:31 Message: Logged In: YES user_id=38388 Sorry for the late reply. I must have missed the initial SF mail. I've had a look at the patch, but I'm not sure whether it can be accepted: wouldn't it break already recognized RedHat-like platforms ? ---------------------------------------------------------------------- Comment By: daishi harada (daishiharada) Date: 2005-11-10 02:23 Message: Logged In: YES user_id=493197 assigning to lemberg as suggested in the file. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1352731&group_id=5470 From noreply at sourceforge.net Sat Jan 13 22:14:31 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 13:14:31 -0800 Subject: [Patches] [ python-Patches-675976 ] mhlib does not obey MHCONTEXT env var Message-ID: Patches item #675976, was opened at 2003-01-28 10:16 Message generated for change (Comment added) made by sjoerd You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675976&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Sjoerd Mullender (sjoerd) Assigned to: Nobody/Anonymous (nobody) Summary: mhlib does not obey MHCONTEXT env var Initial Comment: All programs in the (N)MH suite of programs use the MHCONTEXT environment variable to find the so-called context file where the current folder is remembered. mhlib should do the same, so that it can be used in combination with the standard (N)MH programs. Also, when writing the context file, mhlib should replace the Current-Folder line but keep the other lines in tact. The attached patch fixes both problems. It introduces a new method for the class MH called getcontextfile which uses the MHCONTEXT environment variable to find the context file, and it uses the already existing function updateline to update the context file. Some questions concerning this patch: - should I document the new method or should it be an internal method only? - should the fix be ported to older Python versions? With the patch it does behave differently if you have an MHCONTEXT environment variable. ---------------------------------------------------------------------- >Comment By: Sjoerd Mullender (sjoerd) Date: 2007-01-13 22:14 Message: Logged In: YES user_id=43607 Originator: YES I have added a line to the docstring and I have added a method description to the library reference. Other than those changes, the new patch is identical to the old. I can check this in if you want. File Added: mhlib.patch ---------------------------------------------------------------------- Comment By: A.M. Kuchling (akuchling) Date: 2006-12-22 19:00 Message: Logged In: YES user_id=11375 Originator: NO The patch looks OK. Regarding your questions: 1) I think the method should be documented; it might be useful to subclasses of MH. 2) New feature, so 2.6 only. ---------------------------------------------------------------------- Comment By: Sjoerd Mullender (sjoerd) Date: 2003-02-13 10:48 Message: Logged In: YES user_id=43607 I can assure you that I did check that checkmark. Maybe it's my browser in combination with SF. We'll see if it works this time. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2003-02-13 04:02 Message: Logged In: YES user_id=33168 There's no uploaded file! You have to check the checkbox labeled "Check to Upload & Attach File" when you upload a file. Please try again. (This is a SourceForge annoyance that we can do nothing about. :-( ) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=675976&group_id=5470 From noreply at sourceforge.net Sat Jan 13 23:33:46 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 14:33:46 -0800 Subject: [Patches] [ python-Patches-1563842 ] platform.py support for IronPython Message-ID: Patches item #1563842, was opened at 2006-09-23 03:59 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: M.-A. Lemburg (lemburg) Summary: platform.py support for IronPython Initial Comment: The following patch supplies minimal support for IronPython in platform.py - it makes the sys.version parsing not choke and die. There's a bunch of missing information from IronPython's sys.version string, not much that can be done there. Should platform.py grow an 'implementation' option, so it can detect whether it's IronPython, CPython, Jython, or something else? Patch is against svn trunk. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-13 23:33 Message: Logged In: YES user_id=38388 Originator: NO Checked in a version that also supports IronPython, including the 1.0.1 version. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-13 15:39 Message: Logged In: YES user_id=38388 Originator: NO sanxiyn: What do the extra numbers after the 1.0 stand for ? Do those correspond to branch and revision ? Armin: I'll add support for sys.version_info and sys.subversion as well. ---------------------------------------------------------------------- Comment By: Seo Sanghyeon (sanxiyn) Date: 2006-10-10 04:35 Message: Logged In: YES user_id=837148 The current patch doesn't parse sys.version from IronPython 1.0.1. IronPython 1.0 gives: IronPython 1.0.60816 on .NET 2.0.50727.42 IronPython 1.0.1 gives: IronPython 1.0 (1.0.61005.1977) on .NET 2.0.50727.42 ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2006-10-10 00:48 Message: Logged In: YES user_id=4771 Python2.5 has grown a sys.subversion attribute: ('CPython', 'trunk', '51999') The first field is intended to describe the exact implementation of Python. platform.py could return this if it is available. It should also probably try to use sys.version_info instead of, or in addition to, using a regexp on sys.version. One can hope that in the long term the version_info and the subversion attributes should eventually be supported by all Python implementation (PyPy...). ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-09-25 12:30 Message: Logged In: YES user_id=38388 Thanks. I'll install IronPython and see what else needs to be done. I've already added a few fixes to make Jython play nice with platform.py that I'll check in as well. And yes: I'll add a python_implementation() function that returns 'CPython', 'Jython' and 'IronPython' as appropriate. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563842&group_id=5470 From noreply at sourceforge.net Sun Jan 14 00:21:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 15:21:11 -0800 Subject: [Patches] [ python-Patches-1620174 ] Improve platform.py usability on Windows Message-ID: Patches item #1620174, was opened at 2006-12-21 15:49 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Luke Dunstan (infidel) Assigned to: M.-A. Lemburg (lemburg) Summary: Improve platform.py usability on Windows Initial Comment: This patch modifies platform.py to remove most of the dependencies on pywin32, and use the standard ctypes and _winreg modules instead. It also adds support for Windows CE. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-14 00:21 Message: Logged In: YES user_id=38388 Originator: NO platform.py is used outside the Python distribution to check which Python version is being used (among other things). It has to run with Python versions as early as 1.5.2. That said, it's OK to have it use different ways of accessing the needed information, provided that the signatures and return values of the public APIs don't change. ---------------------------------------------------------------------- Comment By: Luke Dunstan (infidel) Date: 2007-01-01 07:25 Message: Logged In: YES user_id=30442 Originator: YES Why does platform.py need to be compatible with earlier versions of Python? The return types haven't changed, and I think the return values won't change because the same OS APIs are being used. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2006-12-31 19:49 Message: Logged In: YES user_id=38388 Originator: NO I haven't looked at the patch yet, so just a few general comments on changes to platform.py: * the code must continue to work with Python versions prior to 2.6 This means that ctypes and _winreg support may be added as an option, but removing pywin32 calls is not the right way to proceed. * changes in return type of the public and documented APIs are not possible If you have a need for more information, then a new API should be added, or the information merged into one of the existing return fields. * changes in the return values of APIs due to use of different OS APIs must be avoided There's code out there relying on the return values, so if in doubt a new API must be provided. ---------------------------------------------------------------------- Comment By: Luke Dunstan (infidel) Date: 2006-12-31 06:57 Message: Logged In: YES user_id=30442 Originator: YES 1. Yes this is intended for 2.6 2. The only difference between win32api.RegQueryValueEx and _winreg.QueryValueEx seems to be that the latter returns Unicode strings. I have adjusted the patch to be more compatible with the old behaviour. 3. I have updated the doc string in the new patch. File Added: platform-wince-2.diff ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-31 01:13 Message: Logged In: YES user_id=764593 Originator: NO ( win32api.RegQueryValueEx is _winreg.QueryValueEx ) ? If not, it should wait for 2.6, and there should be an entry in what's new. (I suppose similar concerns exist for other return classes.) The change to win32_ver only half-corrects the return type to the four-tuple. The meaning of release (even if it is just "release name") should be specified in the text. def win32_ver(release='',version='',csd='',ptype=''): """ Get additional version information from the Windows Registry - and return a tuple (version,csd,ptype) referring to version + and return a tuple (release,version,csd,ptype) referring to version number, CSD level and OS type (multi/single processor). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1620174&group_id=5470 From noreply at sourceforge.net Sun Jan 14 01:27:56 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 16:27:56 -0800 Subject: [Patches] [ python-Patches-1619846 ] Bug fixes for int unification branch Message-ID: Patches item #1619846, was opened at 2006-12-20 21:36 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: Python 3000 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Adam Olsen (rhamphoryncus) Assigned to: Guido van Rossum (gvanrossum) Summary: Bug fixes for int unification branch Initial Comment: This patch should fix all the real bugs in the int unification branch. All the remaining bugs are either external to the branch or due to tests that need updating (mostly due to the names of int vs long). External bugs: test_socket: http://sourceforge.net/tracker/index.php?func=detail&aid=1619659&group_id=5470&atid=105470 test_class: seems to be caused by using new-style classes by default. Unrelated to int-unification. test_set: inheritance of __hash__. I believe this was fixed in p3yk already. Test failures due to naming differences: test_ctypes test_doctest test_generators test_genexps test_optparse test_pyexpat Tests needing updating, not just due to name differences: test_descr test_pickletools The following aspects need specific review: PyLong_FromVoidPtr was doing the cast wrong. GCC was compiling the (unsigned Py_LONG_LONG)p cast in such a way as to produce a value larger than 2**32, obviously wrong on this 32bit box, and it warned about the cast too. Making it cast to Py_uintptr_t first seems to have corrected both the behaviour and the warning, but may be wrong on other architectures. Many of my changes to use PyInt_CheckExact may be better served by creating a PyInt_CheckSmall macro that retains the range check but allows subclasses. Alternatively, the index interface could be used, but would require more rewriting perhaps best left until later. There are some areas that handled signed vs unsigned and int vs long a bit differently, and they may still need work. Hard to tell what behaviour is correct in such cases. Skipped files: Doc/ext/run-func.c Mac/Modules/ctl/_Ctlmodule.c Mac/Modules/dlg/_Dlgmodule.c Mac/Modules/win/_Winmodule.c Mac/Modules/pycfbridge.c Modules/carbonevt/_CarbonEvtmodule.c Modules/_sqlite/connection.c Modules/almodule.c Modules/cgensupport.c Modules/clmodule.c Modules/flmodule.c Modules/grpmodule.c Modules/posicmodule.c:conv_confname Modules/pyexpat.c Modules/svmodule.c Modules/termios.c Modules/_bsddb.c Modules/_sqlite/statement.c PC/_winreg.c Python/dynload_beos.c Python/mactoolboxglue.c Python/marshal.c Python/pythonrun.c:handle_system_exit RISCOS/Modules/drawfmodule.c RISCOS/Modules/swimodule.c ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 19:27 Message: Logged In: YES user_id=6380 Originator: NO For now I'm just going to submit this; then I'll think about the implications later. My highest priority is to get this merged back into the p3yk branch, although I have no idea how to do that yet... ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 19:05 Message: Logged In: YES user_id=6380 Originator: NO I'll be taking over this branch. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-21 16:22 Message: Logged In: YES user_id=21627 Originator: NO Not this year anymore. I'll try to early next year (hopefully first week of January). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-21 16:12 Message: Logged In: YES user_id=6380 Originator: NO Martin, do you have time to look at this? I'll play with it too but I'd like to have your opinion. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470 From noreply at sourceforge.net Sun Jan 14 00:54:57 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 15:54:57 -0800 Subject: [Patches] [ python-Patches-1634499 ] Py3k: Fix pybench so it runs Message-ID: Patches item #1634499, was opened at 2007-01-12 21:32 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tests Group: Python 3000 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: Py3k: Fix pybench so it runs Initial Comment: This patch fixes pybench so it runs under the current Py3k trunk. I don't claim to have done the right thing, or even that my patch should be accepted. I submit it only in the hope that it's useful to somebody. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 18:54 Message: Logged In: YES user_id=6380 Originator: NO Thanks, applied! ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634499&group_id=5470 From noreply at sourceforge.net Sun Jan 14 01:05:09 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 16:05:09 -0800 Subject: [Patches] [ python-Patches-1619846 ] Bug fixes for int unification branch Message-ID: Patches item #1619846, was opened at 2006-12-20 21:36 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None >Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Adam Olsen (rhamphoryncus) >Assigned to: Guido van Rossum (gvanrossum) Summary: Bug fixes for int unification branch Initial Comment: This patch should fix all the real bugs in the int unification branch. All the remaining bugs are either external to the branch or due to tests that need updating (mostly due to the names of int vs long). External bugs: test_socket: http://sourceforge.net/tracker/index.php?func=detail&aid=1619659&group_id=5470&atid=105470 test_class: seems to be caused by using new-style classes by default. Unrelated to int-unification. test_set: inheritance of __hash__. I believe this was fixed in p3yk already. Test failures due to naming differences: test_ctypes test_doctest test_generators test_genexps test_optparse test_pyexpat Tests needing updating, not just due to name differences: test_descr test_pickletools The following aspects need specific review: PyLong_FromVoidPtr was doing the cast wrong. GCC was compiling the (unsigned Py_LONG_LONG)p cast in such a way as to produce a value larger than 2**32, obviously wrong on this 32bit box, and it warned about the cast too. Making it cast to Py_uintptr_t first seems to have corrected both the behaviour and the warning, but may be wrong on other architectures. Many of my changes to use PyInt_CheckExact may be better served by creating a PyInt_CheckSmall macro that retains the range check but allows subclasses. Alternatively, the index interface could be used, but would require more rewriting perhaps best left until later. There are some areas that handled signed vs unsigned and int vs long a bit differently, and they may still need work. Hard to tell what behaviour is correct in such cases. Skipped files: Doc/ext/run-func.c Mac/Modules/ctl/_Ctlmodule.c Mac/Modules/dlg/_Dlgmodule.c Mac/Modules/win/_Winmodule.c Mac/Modules/pycfbridge.c Modules/carbonevt/_CarbonEvtmodule.c Modules/_sqlite/connection.c Modules/almodule.c Modules/cgensupport.c Modules/clmodule.c Modules/flmodule.c Modules/grpmodule.c Modules/posicmodule.c:conv_confname Modules/pyexpat.c Modules/svmodule.c Modules/termios.c Modules/_bsddb.c Modules/_sqlite/statement.c PC/_winreg.c Python/dynload_beos.c Python/mactoolboxglue.c Python/marshal.c Python/pythonrun.c:handle_system_exit RISCOS/Modules/drawfmodule.c RISCOS/Modules/swimodule.c ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 19:05 Message: Logged In: YES user_id=6380 Originator: NO I'll be taking over this branch. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-21 16:22 Message: Logged In: YES user_id=21627 Originator: NO Not this year anymore. I'll try to early next year (hopefully first week of January). ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-21 16:12 Message: Logged In: YES user_id=6380 Originator: NO Martin, do you have time to look at this? I'll play with it too but I'd like to have your opinion. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1619846&group_id=5470 From noreply at sourceforge.net Sun Jan 14 08:35:50 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 23:35:50 -0800 Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative ints Message-ID: Patches item #1635058, was opened at 2007-01-14 01:35 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: htonl et al accept negative ints Initial Comment: Referencing bug 1619659 This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation. I wrote a test case to ensure things work as expected, and ensured all tests pass. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 From noreply at sourceforge.net Sun Jan 14 00:59:51 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 15:59:51 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 04:37 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) >Assigned to: Guido van Rossum (gvanrossum) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 18:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 19:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 12:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 01:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 23:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 23:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 22:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 21:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 21:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 15:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 15:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 13:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 20:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 20:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 13:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 05:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 00:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sun Jan 14 11:42:56 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 02:42:56 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) >Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-14 10:42 Message: Logged In: YES user_id=364875 Originator: YES Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"*1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 23:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-13 00:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sun Jan 14 12:44:50 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 03:44:50 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-14 11:44 Message: Logged In: YES user_id=364875 Originator: YES Here's another possible fix for the worst-case scenario: #define MAX_SLICE_DELTA (64*1024) if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) || (size_of_slice > (size_of_original / 2)) ) use_lazy_slice(); else create_string_as_normal(); You'd still get the full benefit of lazy slices most of the time, but it takes the edge off the really pathological cases. How's that? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 10:42 Message: Logged In: YES user_id=364875 Originator: YES Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"*1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 23:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-13 00:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sun Jan 14 18:04:58 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 09:04:58 -0800 Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative ints Message-ID: Patches item #1635058, was opened at 2007-01-14 02:35 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: htonl et al accept negative ints Initial Comment: Referencing bug 1619659 This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation. I wrote a test case to ensure things work as expected, and ensured all tests pass. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 12:04 Message: Logged In: YES user_id=6380 Originator: NO Thanks, submitted. (Note that I had to fix the indentation in your patch; you used four spaces where the original code used tabs. Please be consistent!) Can you check if there's a need to update the docs? If there is, send me a doc patch and I'll apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 From noreply at sourceforge.net Sun Jan 14 21:31:32 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 12:31:32 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 20:53 Message generated for change (Comment added) made by tonylownds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Tony Lownds (tonylownds) Date: 2007-01-14 20:31 Message: Logged In: YES user_id=24100 Originator: YES Combines the code paths for MAKE_FUNCTION and MAKE_CLOSURE. Fixes a crash where functions with closures and either annotations or keyword-only arguments result in MAKE_CLOSURE, but only MAKE_FUNCTION has the code to handle annotations or keyword-only arguments. Includes enough tests to trigger the bug. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-06 21:03 Message: Logged In: YES user_id=24100 Originator: YES I tried to implement getargspec() as described, and unfortunately there is another wrinkle to consider. Keyword-only arguments may or may not have defaults. So the invariant described in getargspec()'s docstring can't be maintained when simply appending keyword-only arguments. A tuple of four things is returned: (args, varargs, varkw, defaults). 'args' is a list of the argument names (it may contain nested lists). 'args' will include keyword-only argument names. 'varargs' and 'varkw' are the names of the * and ** arguments or None. 'defaults' is an n-tuple of the default values of the last n arguments. The attached patch adds an 'getfullargspec' API that returns complete information; 'getargspec' raises an error if information would be lost; the order of arguments in 'formatargspec' is backwards compatible, so that formatargspec(*getargspec(f)) == formatargspec(*getfullargspec(f)) when getargspec(f) does not raise an error. PEP 362 could and probably should replace the new getfullargspec() function, so I did not implement an API more complicated than a tuple. File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-06 20:05 Message: Logged In: YES user_id=24100 Originator: YES Change peepholer to not bail in the presence of EXTENDED_ARG + MAKE_FUNCTION. Enforce the natural 16-bit limit of annotations in compile.c. File Added: peepholer_and_max_annotations.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 17:53 Message: Logged In: YES user_id=6380 Originator: NO I like the following approach: (1) the old API continues to work for all functions, but provides incomplete information (not losing the kw-only args completely, but losing the fact that they are kw-only); (2) add a new API that provides all the relevant information. Maybe the new API should not return a 7-tuple but rather a structure with named attributes; that makes it more future-proof. Sorry, I don't have any good suggestions for new names. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 07:12 Message: Logged In: YES user_id=24100 Originator: YES For getargs and getargvalues, including the names in positional args is an excellent strategy. There are uses (in cgitb) in the stdlib for getargvalues that then wouldn't need to be changed. The 2 uses of getargspec in the stdlib (one of which I missed, in DocXMLRPCServer) are both closely followed by formatargspec. I think those APIs should change or information will be lost. Alternatively, a new function (hopefully with a better name than getfullargspec :) could be made and getargspec could retain its API, but raise an error when keyword-only arguments are present. def getargspec(func): args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann = getfullargspec() if kwonlyargs: raise ValueError, "function has keyword-only arguments, use getfullargspec!" return args, varargs, varkw, defaults I'll update the patch to fix getargvalues and DocXMLRPCServer this weekend. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 05:22 Message: Logged In: YES user_id=6380 Originator: NO Well, it depends on the context whether that matters. The kw-only args could just be included in the positional args (which have names anyway) and that wouldn't be so bad for some apps. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 05:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 06:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 03:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 01:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 01:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 00:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 19:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 15:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 21:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 20:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 20:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 22:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 18:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 09:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 08:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-20 01:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 17:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-04 01:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Sun Jan 14 21:32:14 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 12:32:14 -0800 Subject: [Patches] [ python-Patches-1607548 ] Optional Argument Syntax Message-ID: Patches item #1607548, was opened at 2006-12-02 20:53 Message generated for change (Comment added) made by tonylownds You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: Accepted Priority: 5 Private: No Submitted By: Tony Lownds (tonylownds) Assigned to: Guido van Rossum (gvanrossum) Summary: Optional Argument Syntax Initial Comment: This patch implements optional argument syntax for Python 3000. The patch still has issues; I am posting so that Collin Winters can add a link to the PEP. The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. This patch alters the MAKE_FUNCTION opcode. I have an implementation that built the func_annotations dictionary in bytecode as well but it was bigger and slower. ---------------------------------------------------------------------- >Comment By: Tony Lownds (tonylownds) Date: 2007-01-14 20:32 Message: Logged In: YES user_id=24100 Originator: YES File Added: make_closure_fix.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-14 20:31 Message: Logged In: YES user_id=24100 Originator: YES Combines the code paths for MAKE_FUNCTION and MAKE_CLOSURE. Fixes a crash where functions with closures and either annotations or keyword-only arguments result in MAKE_CLOSURE, but only MAKE_FUNCTION has the code to handle annotations or keyword-only arguments. Includes enough tests to trigger the bug. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-06 21:03 Message: Logged In: YES user_id=24100 Originator: YES I tried to implement getargspec() as described, and unfortunately there is another wrinkle to consider. Keyword-only arguments may or may not have defaults. So the invariant described in getargspec()'s docstring can't be maintained when simply appending keyword-only arguments. A tuple of four things is returned: (args, varargs, varkw, defaults). 'args' is a list of the argument names (it may contain nested lists). 'args' will include keyword-only argument names. 'varargs' and 'varkw' are the names of the * and ** arguments or None. 'defaults' is an n-tuple of the default values of the last n arguments. The attached patch adds an 'getfullargspec' API that returns complete information; 'getargspec' raises an error if information would be lost; the order of arguments in 'formatargspec' is backwards compatible, so that formatargspec(*getargspec(f)) == formatargspec(*getfullargspec(f)) when getargspec(f) does not raise an error. PEP 362 could and probably should replace the new getfullargspec() function, so I did not implement an API more complicated than a tuple. File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-06 20:05 Message: Logged In: YES user_id=24100 Originator: YES Change peepholer to not bail in the presence of EXTENDED_ARG + MAKE_FUNCTION. Enforce the natural 16-bit limit of annotations in compile.c. File Added: peepholer_and_max_annotations.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 17:53 Message: Logged In: YES user_id=6380 Originator: NO I like the following approach: (1) the old API continues to work for all functions, but provides incomplete information (not losing the kw-only args completely, but losing the fact that they are kw-only); (2) add a new API that provides all the relevant information. Maybe the new API should not return a 7-tuple but rather a structure with named attributes; that makes it more future-proof. Sorry, I don't have any good suggestions for new names. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 07:12 Message: Logged In: YES user_id=24100 Originator: YES For getargs and getargvalues, including the names in positional args is an excellent strategy. There are uses (in cgitb) in the stdlib for getargvalues that then wouldn't need to be changed. The 2 uses of getargspec in the stdlib (one of which I missed, in DocXMLRPCServer) are both closely followed by formatargspec. I think those APIs should change or information will be lost. Alternatively, a new function (hopefully with a better name than getfullargspec :) could be made and getargspec could retain its API, but raise an error when keyword-only arguments are present. def getargspec(func): args, varargs, kwonlyargs, kwdefaults, varkw, defaults, ann = getfullargspec() if kwonlyargs: raise ValueError, "function has keyword-only arguments, use getfullargspec!" return args, varargs, varkw, defaults I'll update the patch to fix getargvalues and DocXMLRPCServer this weekend. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 05:22 Message: Logged In: YES user_id=6380 Originator: NO Well, it depends on the context whether that matters. The kw-only args could just be included in the positional args (which have names anyway) and that wouldn't be so bad for some apps. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2007-01-04 05:17 Message: Logged In: YES user_id=24100 Originator: YES I think everyone should update have to update their uses of getargspec and friends, because otherwise they will silently mis-handle keyword-only arguments. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-04 04:30 Message: Logged In: YES user_id=6380 Originator: NO I'm not sure it's right to just change the signature of the various functions in inspect.py; that would break all existing code using that module (and there definitely are other users besides pydoc). It would be better to add new methods that provide access to the additional functionality. Or do you think that everyone will have to change their code anyway? ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-28 06:53 Message: Logged In: YES user_id=33168 Originator: NO I'm skipping the pydoc patch. Didn't even look at it. I don't have the refleak, but I changed some calls and may have fixed it. Committed revision 53170. Leaving open to deal with the pydoc patch. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 03:04 Message: Logged In: YES user_id=24100 Originator: YES Nothing else on the C side of things. The pydoc patch works well for me; more tests ought to be added for function annotations and also for keyword-only arguments, but perhaps that can be added on as a later patch after checkin. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 01:38 Message: Logged In: YES user_id=6380 Originator: NO Thanks! Is there anything else that you think needs to be done before I check this in? The core code looks alright to me; I can't be bothered with reviewing the ast stuff or the compiler package since I don't know enough about these, but given that it compiles things correctly I'm not so worried about those. What's the status of the pydoc patch? Are you still working on that? ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-28 01:28 Message: Logged In: YES user_id=24100 Originator: YES Fixed in latest patch. Also added VISIT call for func_annotations. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-28 00:40 Message: Logged In: YES user_id=6380 Originator: NO I believe I've found a leak in the code that adds annotations to a function object. See this session: >>> x = object() >>> import sys >>> sys.getrefcount(x) 2 >>> for i in range(100): ... def f(x: x): pass ... >>> del f >>> sys.getrefcount(x) 102 >>> At first I thought this could be due to the code added to the MAKE_FUNCTION opcode, but I don't see a leak there. More likely func_annotations is not being freed when a function object is deleted. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 19:05 Message: Logged In: YES user_id=24100 Originator: YES Initial patch to implement keyword-only arguments and annotations support for pydoc and inspect. Tests do not exercise these features, yet. Output for annotations that are types is special cased so that for: def intmin(*a: int) -> int: pass ...help(intmin) will display: intmin(*a: int) -> int File Added: pydoc.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-23 15:53 Message: Logged In: YES user_id=24100 Originator: YES Fixed the non-C89 style lines and the formatting (hopefully in compatible style :) File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-22 21:41 Message: Logged In: YES user_id=6380 Originator: NO Thanks for the progress! There are still a few lines ending in whitespace or lines that are longer than 80 chars (and weren't before). Mind cleaning those up? Also ceval.c:2305 and compile.c:1440 contain code that gcc 2.95 won't compile (the 'int' declarations ought to be moved to the start of the containing {...} block); I think this style is not C89 compatible. ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-22 20:15 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Fix crasher in Python/symtable.c -- annotations were visited inside the function scope 2. Fix Lib/compiler issues with Lib/test/test_complex_args. Output from Lib/compiler does not pass all tests, same failures as in HEAD of p3yk branch. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-21 20:21 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Address Neal's comments (I hope) 2. test_scope passes 3. Added some additional tests to test_compiler Open implementation issues: 1. Output from Lib/compiler does not pass test_complex_args, test_scope, possibly more. File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 22:13 Message: Logged In: YES user_id=24100 Originator: YES Changes: 1. Updated to apply cleanly 2. Fix to compile.c so that test_complex_args passes Open implementation issues: 1. Neal's comments 2. test_scope fails 3. Output from Lib/compiler does not pass test_complex_args File Added: opt_arg_ann.patch ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-20 18:04 Message: Logged In: YES user_id=24100 Originator: YES I'll work on code formatting and the error checking and other cleanup. Open to other names than tname and vname, I created those non-terminals in order to use the same code for processing "def" and "lambda". Terminals are caps IIUC. I did add a test for the multi-paren situation. 2.5 had that bug too. Re: no changes to ceval, I tried generating the func_annotations dictionary using bytecodes. That doesn't change the ceval loop but was more code and was slower. So there is a way to avoid ceval changes. Re: deciding if lambda was going to require parens around the arguments, I don't think there was any decision, and yes annotations would be easily supportable. Happy to change if there is support, it's backwards incompatible. Re: return type syntax, I have only seen the -> syntax (vs a keyword 'as') on Guido's blog. Thanks for the comments! ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 09:25 Message: Logged In: YES user_id=33168 Originator: NO Nix this comment: I would definitely prefer the annotations baked into the code object so there are no changes to ceval. I see that Guido wants it the way it currently is which makes sense for nested functions. There should probably be a test with nested functions even though it really shouldn't be different. The test will verify that. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-12-20 08:38 Message: Logged In: YES user_id=33168 Originator: NO When regenerating the patch, can you also remove non-functional changes such as removing unneeded parens and whitespace changes. Also, please try to keep the same formatting in the file wrt tabs and spaces and don't move code around. I know this is a pain and inconsistent. I think I changed ast.c to be all 4 space indents with spaces only. In compiler_simple_arg(), don't you need to check if annotation is NULL when returned from ast_for_expr? Otherwise an undetected error would go through, wouldn't it? In compiler_complex_args(), don't you need to set the ast_error (or a SystemError) if the switch isn't a tname, vname, or LPAR? I don't like the names tname and vname. Also they seem inconsistent. Aren't all the other names all CAPS? In hunk, @@ -602,51 +625,75 @@ remove the commented out code. We shouldn't use any // style comments either. Can you improve the error msg for kwdefaults == NULL? (Thanks for adding it!) Check annotation for NULL if returned from ast_for_expr? BTW, the AST code in this area was tricky code which had some bugs. Did you test with adding extra parentheses and singleton tuples? I'm not sure if Guido preferred syntax -> vs a keyword 'as' for the return type. In symtable.c remove the printfs. They should probably be SystemErrors or something. I would definitely prefer the annotations baked into the code object so there are no changes to ceval. Did we decide if lambda was going to require parens around the arguments? If so, it could support annotations, right? (No comment on the usefulness of annotations for lambdas. :-) In compiler_visit_argannotation, you should return the result from PyList_Append and can remove the comment about checking for errors. Also, I believe the INCREF is not needed, it will be done by PyList_Append. Same deal with returning result of compiler_visit_argannotations() (the one with an s). Need to check for PyList_New() returning NULL in compiler_visit_annotations(). Lots more error checking needs to be added in this area. Dammit, I really want to use Mondrian for these comments! (Sorry Tony, not your fault, I'm just having some bad memories at this point cause I have to keep providing the references.) This patch looks very complete in that it updates things like the compiler package and the parsermodule.c. Good job! This is a great start. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2006-12-20 01:22 Message: Logged In: YES user_id=6380 Originator: NO Applying the patch fails, probably due to recent merge activities in the p3yk branch. Can I inconvenience you with a request to regenerate the patch from the branch head? ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2006-12-11 17:29 Message: Logged In: YES user_id=764593 Originator: NO Could you rename it to "argument annotations"? "optional argument" makes me think of the current keyword arguments, that can be but don't have to be passed. -jJ ---------------------------------------------------------------------- Comment By: Tony Lownds (tonylownds) Date: 2006-12-04 01:24 Message: Logged In: YES user_id=24100 Originator: YES This patch implements optional argument syntax for Python 3000. The patch still has issues: 1. test_ast and test_scope fail. 2. Running the test suite after compiling the library with the compiler package causes failures 3. no docs 4. C-code reference counts and error checking needs a review The syntax implemented is roughly: def f(arg:expr, (nested1:expr, nested2:expr)) -> expr: suite The function object has a new attribute, func_annotations that maps from argument names to the result of the expression. The return annotation is stored with a key of 'return'. Lambda's syntax doesn't support annotations. The ast format has changed for the builtin compiler and the compiler package. A new token was added, '->' (called RARROW in token.h). token.py lost ERRORTOKEN after re-generating, I don't know why. I added it back manually. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1607548&group_id=5470 From noreply at sourceforge.net Sun Jan 14 22:57:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 13:57:30 -0800 Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile patch Message-ID: Patches item #1598415, was opened at 2006-11-17 15:44 Message generated for change (Comment added) made by vsajip You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.5 >Status: Pending >Resolution: Fixed Priority: 5 Private: No Submitted By: chads (cjschr) Assigned to: Vinay Sajip (vsajip) Summary: Logging Module - followfile patch Initial Comment: Pertaining to the FileHandler and the file being written to: It's possible that the file being written to will be rolled-over by an external application such as newsyslog. By default, FileHandler tracks the file descriptor, not the file. If the original file is renamed, the file descriptor is still updated; however, it's probably desired that continued updates to the original file take place instead. This patch adds an attribute to the FileHandler class constructor (and basicConfig kw as well). If the attribute evaluates to True, the filename, not the descriptor is tracked. Basically, the code compares the file status from a previous emit call to the current call before the base class emit is called. If a difference in st_ino or st_dev is found, the current stream is flush/closed and a new one, based on baseFilename, is created, file status is updated, and then the base class emit is called. ---------------------------------------------------------------------- >Comment By: Vinay Sajip (vsajip) Date: 2007-01-14 21:57 Message: Logged In: YES user_id=308438 Originator: NO WatchedFileHandler added to logging.handlers, checked into trunk. Documentation updated, too. ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-11 21:50 Message: Logged In: YES user_id=308438 Originator: NO I've had a bit more of a think about this, and realised that I made a boo-boo in one of my earlier comments. Under Windows, log files are opened with exclusive locks, so that other processes cannot rename or move files which are open. So I believe the approach won't work at all under Windows. (Chad, sorry about making you redo the patch with ST_SIZE rather than ST_DEV and ST_INO). I also think this is a less common use case than warrants supporting it at the basicConfig() level, which is for really very basic usage configuration. So I would advocate adding a WatchedFileHandler (in logging.handlers) which watches st_dev and st_ino (as per Chad's original patch) and closes the old file descriptor and reopens the file when a change is seen. Some recent changes checked into SVN trunk facilitate the reopening - I've added an _open() method to FileHandler to do this. Chad, what do you think of this approach? ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 17:06 Message: Logged In: YES user_id=1093928 Originator: YES Uploaded the wrong diff. This is the correct one. ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 17:02 Message: Logged In: YES user_id=1093928 Originator: YES Updated per vsajip to work on Windoze too. The code now checks for a current size < previous size (based on ST_SIZE). ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2006-11-19 20:32 Message: Logged In: YES user_id=308438 Originator: NO This patch, relying as it does on Unix-specific details such as i-nodes, does not appear as if it will work under Windows. For that reason I will mark it as Pending and Invalid for now, if cjschr can update this tracker item with how the patch will work on Windows, I will look at it further. The SF system will automatically close it if no update is made to the item in approx. 2 weeks, though it can still be reopened after that. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-11-18 19:14 Message: Logged In: YES user_id=849994 Originator: NO Assigning to Vinay. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 From noreply at sourceforge.net Sun Jan 14 23:24:05 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 14:24:05 -0800 Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative ints Message-ID: Patches item #1635058, was opened at 2007-01-14 01:35 Message generated for change (Comment added) made by mark-roberts You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Closed Resolution: Accepted Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: htonl et al accept negative ints Initial Comment: Referencing bug 1619659 This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation. I wrote a test case to ensure things work as expected, and ensured all tests pass. ---------------------------------------------------------------------- >Comment By: Mark Roberts (mark-roberts) Date: 2007-01-14 16:24 Message: Logged In: YES user_id=1591633 Originator: YES Hmm, I'll remember consistency when working with the C implementation. The Python that I've looked at seems to always use 4 spaces. At any rate, here's a doc patch. It essentially just makes "n bit integers" read "n bit positive integers". Other than that, I can think of no way to update the docs to reflect the scope of this patch. Thanks for everything, Guido! File Added: bug_g119659_doc.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 11:04 Message: Logged In: YES user_id=6380 Originator: NO Thanks, submitted. (Note that I had to fix the indentation in your patch; you used four spaces where the original code used tabs. Please be consistent!) Can you check if there's a need to update the docs? If there is, send me a doc patch and I'll apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 From noreply at sourceforge.net Mon Jan 15 01:02:49 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 16:02:49 -0800 Subject: [Patches] [ python-Patches-1635058 ] htonl et al accept negative ints Message-ID: Patches item #1635058, was opened at 2007-01-14 02:35 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Closed Resolution: Accepted Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: htonl et al accept negative ints Initial Comment: Referencing bug 1619659 This patch ensures that htonl and friends never accept or return negative numbers, per the underlying C implementation. I wrote a test case to ensure things work as expected, and ensured all tests pass. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 19:02 Message: Logged In: YES user_id=6380 Originator: NO Thanks, applied! ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-14 17:24 Message: Logged In: YES user_id=1591633 Originator: YES Hmm, I'll remember consistency when working with the C implementation. The Python that I've looked at seems to always use 4 spaces. At any rate, here's a doc patch. It essentially just makes "n bit integers" read "n bit positive integers". Other than that, I can think of no way to update the docs to reflect the scope of this patch. Thanks for everything, Guido! File Added: bug_g119659_doc.patch ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 12:04 Message: Logged In: YES user_id=6380 Originator: NO Thanks, submitted. (Note that I had to fix the indentation in your patch; you used four spaces where the original code used tabs. Please be consistent!) Can you check if there's a need to update the docs? If there is, send me a doc patch and I'll apply it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635058&group_id=5470 From noreply at sourceforge.net Mon Jan 15 01:03:59 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 16:03:59 -0800 Subject: [Patches] [ python-Patches-1635454 ] CSV DictWriter Errors Message-ID: Patches item #1635454, was opened at 2007-01-14 18:03 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635454&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: CSV DictWriter Errors Initial Comment: In reponse to feature request 1634717. The DictWriter, with this patch, should return a list of all offending extraneous field names, instead of simply raising a failure. I could see a use case for this in error reporting. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635454&group_id=5470 From noreply at sourceforge.net Mon Jan 15 01:40:00 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 16:40:00 -0800 Subject: [Patches] [ python-Patches-1635473 ] strptime %F and %T directives Message-ID: Patches item #1635473, was opened at 2007-01-14 18:40 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: strptime %F and %T directives Initial Comment: In response to bug 1633628. %F and %T are valid directives. These are added to Lib/_strptime.py via adding the Y-M-d H:M:S directives in sub-expressions. Includes a test case. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470 From noreply at sourceforge.net Mon Jan 15 02:05:10 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 17:05:10 -0800 Subject: [Patches] [ python-Patches-1635473 ] strptime %F and %T directives Message-ID: Patches item #1635473, was opened at 2007-01-14 18:40 Message generated for change (Comment added) made by mark-roberts You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: strptime %F and %T directives Initial Comment: In response to bug 1633628. %F and %T are valid directives. These are added to Lib/_strptime.py via adding the Y-M-d H:M:S directives in sub-expressions. Includes a test case. ---------------------------------------------------------------------- >Comment By: Mark Roberts (mark-roberts) Date: 2007-01-14 19:05 Message: Logged In: YES user_id=1591633 Originator: YES I took a look on the time documentation page, and it did not detail %F and %T, even though they were supported in strftime. I added them to the documentation page since strptime now supports them. File Added: bug_1633628_strptime_doc.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470 From noreply at sourceforge.net Sun Jan 14 17:32:33 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 14 Jan 2007 08:32:33 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 04:37 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 11:32 Message: Logged In: YES user_id=6380 Originator: NO Sorry, the test_array failure was due to not rebuilding after patching. Because extension modules are built using distutils, they don't get automatically rebuilt when a relevant header has changed. "grind to a halt": swapping, probably, due to memory filling up with 1M-character string objects, as you experienced yourself. Your proposal takes the edge off, although I can still come up with a worst-case scenario (just use 64K strings instead of 1M strings, and leave the rest the same). I am far from convinced that replacing one pathological case (O(N**2) concatenation, which is easily explained and avoided) with another (which is harder to explain due to the more complicated algorithms and heuristics involved) is a good trade-off. This is all the worse since your optimization doesn't have a clear time/space trade-off: it mostly attempts to preserve time *and* space, but in the worst case it can *waste* space. (And I'm not convinced there can't be a pathological case where it is slower, too.) And the gains are dependent on the ability to *avoid* ultimately rendering the string; if every string ends up being rendered, there is no net gain in space, and there might be no net gain in time either (at least not for slices). I believe I would rather not pursue this patch further at this time; a far more important programming task is the str/unicode unification (now that the int/long unification is mostly there). If you want to clean up the patch, I suggest that you add a large comment section somewhere (unicode.h?) describing the algorithms in a lot of detail, including edge cases and performance analysis, to make review of the code possible. But you're most welcome to withdraw it, too; it would save me a lot of headaches. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 06:44 Message: Logged In: YES user_id=364875 Originator: YES Here's another possible fix for the worst-case scenario: #define MAX_SLICE_DELTA (64*1024) if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) || (size_of_slice > (size_of_original / 2)) ) use_lazy_slice(); else create_string_as_normal(); You'd still get the full benefit of lazy slices most of the time, but it takes the edge off the really pathological cases. How's that? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 05:42 Message: Logged In: YES user_id=364875 Originator: YES Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"*1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 18:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 19:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 12:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 01:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 23:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 23:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 22:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 21:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-11 21:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 15:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 15:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 13:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 20:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 20:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 13:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 05:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 00:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sun Jan 14 00:01:03 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 13 Jan 2007 15:01:03 -0800 Subject: [Patches] [ python-Patches-1563844 ] pybench support for IronPython Message-ID: Patches item #1563844, was opened at 2006-09-23 04:05 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563844&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: M.-A. Lemburg (lemburg) Summary: pybench support for IronPython Initial Comment: The following patch to pybench/pybench.py makes it work on IronPython. IronPython returns NotImplementedError for both gc.disable() and sys.setcheckinterval() - catch that and report it. This also requires patch #1563842, which fixes platform.py for IronPython. ---------------------------------------------------------------------- >Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-14 00:01 Message: Logged In: YES user_id=38388 Originator: NO Checked in together with revision 53414. Note that I haven't tested this on IronPython. Please reopen if the patch doesn't work. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2006-09-23 04:31 Message: Logged In: YES user_id=29957 Sigh. Somewhere I dropped the magic 'print' statements. Fixed patch applied. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1563844&group_id=5470 From noreply at sourceforge.net Mon Jan 15 12:48:53 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 15 Jan 2007 03:48:53 -0800 Subject: [Patches] [ python-Patches-1617699 ] slice-object support for ctypes Pointer/Array Message-ID: Patches item #1617699, was opened at 2006-12-18 05:28 Message generated for change (Comment added) made by twouters You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Thomas Heller (theller) Summary: slice-object support for ctypes Pointer/Array Initial Comment: Support for slicing ctypes' Pointer and Array types with slice objects, although only for step=1 case. (Backported from p3yk-noslice branch.) ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2007-01-15 12:48 Message: Logged In: YES user_id=34209 Originator: YES The point is that simple slicing will go away, and extended slices (with sliceobjects) are used in various places, and currently can't be passed on to ctypes arrays and pointers. That is to say, a Python class defining __getitem__ but not __getslice__ still supports slice syntax, but it can't do 'pointer[sliceobj]' -- it would have to do 'pointer[sliceobj.start:sliceobj.end]'. Also, because simple slices will go away, this code will have to be added to the p3yk branch in any case; having it in the trunk just makes for easier maintenance. Oh, and the non-support for steps other than 1 is not a fundamental issue, I just couldn't bear to write the code for that if you didn't think it would be useful, as I'd already written the same logic and arithmetic for array, tupleseq, mmap and I forget what else :P You can consider this code half-done, if you wish; I'll get to it again soon enough. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-12 21:48 Message: Logged In: YES user_id=11105 Originator: NO Thomas, a question: Since steps != 1 are not supported, does this patch have any value? IIUC, array[x:y] returns exactly the same as array[x:y:1] for all x and y values. Formally, the patch is missing unittests and documentation ;-). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 19:45 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1617699&group_id=5470 From noreply at sourceforge.net Mon Jan 15 16:43:10 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 15 Jan 2007 07:43:10 -0800 Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile patch Message-ID: Patches item #1598415, was opened at 2006-11-17 09:44 Message generated for change (Comment added) made by cjschr You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.5 >Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: chads (cjschr) Assigned to: Vinay Sajip (vsajip) Summary: Logging Module - followfile patch Initial Comment: Pertaining to the FileHandler and the file being written to: It's possible that the file being written to will be rolled-over by an external application such as newsyslog. By default, FileHandler tracks the file descriptor, not the file. If the original file is renamed, the file descriptor is still updated; however, it's probably desired that continued updates to the original file take place instead. This patch adds an attribute to the FileHandler class constructor (and basicConfig kw as well). If the attribute evaluates to True, the filename, not the descriptor is tracked. Basically, the code compares the file status from a previous emit call to the current call before the base class emit is called. If a difference in st_ino or st_dev is found, the current stream is flush/closed and a new one, based on baseFilename, is created, file status is updated, and then the base class emit is called. ---------------------------------------------------------------------- >Comment By: chads (cjschr) Date: 2007-01-15 09:43 Message: Logged In: YES user_id=1093928 Originator: YES I like the implementation Vinay. Nice work. Thx ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-14 15:57 Message: Logged In: YES user_id=308438 Originator: NO WatchedFileHandler added to logging.handlers, checked into trunk. Documentation updated, too. ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-11 15:50 Message: Logged In: YES user_id=308438 Originator: NO I've had a bit more of a think about this, and realised that I made a boo-boo in one of my earlier comments. Under Windows, log files are opened with exclusive locks, so that other processes cannot rename or move files which are open. So I believe the approach won't work at all under Windows. (Chad, sorry about making you redo the patch with ST_SIZE rather than ST_DEV and ST_INO). I also think this is a less common use case than warrants supporting it at the basicConfig() level, which is for really very basic usage configuration. So I would advocate adding a WatchedFileHandler (in logging.handlers) which watches st_dev and st_ino (as per Chad's original patch) and closes the old file descriptor and reopens the file when a change is seen. Some recent changes checked into SVN trunk facilitate the reopening - I've added an _open() method to FileHandler to do this. Chad, what do you think of this approach? ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 11:06 Message: Logged In: YES user_id=1093928 Originator: YES Uploaded the wrong diff. This is the correct one. ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 11:02 Message: Logged In: YES user_id=1093928 Originator: YES Updated per vsajip to work on Windoze too. The code now checks for a current size < previous size (based on ST_SIZE). ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2006-11-19 14:32 Message: Logged In: YES user_id=308438 Originator: NO This patch, relying as it does on Unix-specific details such as i-nodes, does not appear as if it will work under Windows. For that reason I will mark it as Pending and Invalid for now, if cjschr can update this tracker item with how the patch will work on Windows, I will look at it further. The SF system will automatically close it if no update is made to the item in approx. 2 weeks, though it can still be reopened after that. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-11-18 13:14 Message: Logged In: YES user_id=849994 Originator: NO Assigning to Vinay. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 From noreply at sourceforge.net Mon Jan 15 17:29:21 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 15 Jan 2007 08:29:21 -0800 Subject: [Patches] [ python-Patches-1598415 ] Logging Module - followfile patch Message-ID: Patches item #1598415, was opened at 2006-11-17 15:44 Message generated for change (Comment added) made by vsajip You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.5 >Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: chads (cjschr) Assigned to: Vinay Sajip (vsajip) Summary: Logging Module - followfile patch Initial Comment: Pertaining to the FileHandler and the file being written to: It's possible that the file being written to will be rolled-over by an external application such as newsyslog. By default, FileHandler tracks the file descriptor, not the file. If the original file is renamed, the file descriptor is still updated; however, it's probably desired that continued updates to the original file take place instead. This patch adds an attribute to the FileHandler class constructor (and basicConfig kw as well). If the attribute evaluates to True, the filename, not the descriptor is tracked. Basically, the code compares the file status from a previous emit call to the current call before the base class emit is called. If a difference in st_ino or st_dev is found, the current stream is flush/closed and a new one, based on baseFilename, is created, file status is updated, and then the base class emit is called. ---------------------------------------------------------------------- >Comment By: Vinay Sajip (vsajip) Date: 2007-01-15 16:29 Message: Logged In: YES user_id=308438 Originator: NO You're welcome, Chad. Thanks for the idea. Closing this item. ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2007-01-15 15:43 Message: Logged In: YES user_id=1093928 Originator: YES I like the implementation Vinay. Nice work. Thx ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-14 21:57 Message: Logged In: YES user_id=308438 Originator: NO WatchedFileHandler added to logging.handlers, checked into trunk. Documentation updated, too. ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-11 21:50 Message: Logged In: YES user_id=308438 Originator: NO I've had a bit more of a think about this, and realised that I made a boo-boo in one of my earlier comments. Under Windows, log files are opened with exclusive locks, so that other processes cannot rename or move files which are open. So I believe the approach won't work at all under Windows. (Chad, sorry about making you redo the patch with ST_SIZE rather than ST_DEV and ST_INO). I also think this is a less common use case than warrants supporting it at the basicConfig() level, which is for really very basic usage configuration. So I would advocate adding a WatchedFileHandler (in logging.handlers) which watches st_dev and st_ino (as per Chad's original patch) and closes the old file descriptor and reopens the file when a change is seen. Some recent changes checked into SVN trunk facilitate the reopening - I've added an _open() method to FileHandler to do this. Chad, what do you think of this approach? ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 17:06 Message: Logged In: YES user_id=1093928 Originator: YES Uploaded the wrong diff. This is the correct one. ---------------------------------------------------------------------- Comment By: chads (cjschr) Date: 2006-11-20 17:02 Message: Logged In: YES user_id=1093928 Originator: YES Updated per vsajip to work on Windoze too. The code now checks for a current size < previous size (based on ST_SIZE). ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2006-11-19 20:32 Message: Logged In: YES user_id=308438 Originator: NO This patch, relying as it does on Unix-specific details such as i-nodes, does not appear as if it will work under Windows. For that reason I will mark it as Pending and Invalid for now, if cjschr can update this tracker item with how the patch will work on Windows, I will look at it further. The SF system will automatically close it if no update is made to the item in approx. 2 weeks, though it can still be reopened after that. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-11-18 19:14 Message: Logged In: YES user_id=849994 Originator: NO Assigning to Vinay. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1598415&group_id=5470 From noreply at sourceforge.net Mon Jan 15 19:53:24 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 15 Jan 2007 10:53:24 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-15 18:53 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 16:32 Message: Logged In: YES user_id=6380 Originator: NO Sorry, the test_array failure was due to not rebuilding after patching. Because extension modules are built using distutils, they don't get automatically rebuilt when a relevant header has changed. "grind to a halt": swapping, probably, due to memory filling up with 1M-character string objects, as you experienced yourself. Your proposal takes the edge off, although I can still come up with a worst-case scenario (just use 64K strings instead of 1M strings, and leave the rest the same). I am far from convinced that replacing one pathological case (O(N**2) concatenation, which is easily explained and avoided) with another (which is harder to explain due to the more complicated algorithms and heuristics involved) is a good trade-off. This is all the worse since your optimization doesn't have a clear time/space trade-off: it mostly attempts to preserve time *and* space, but in the worst case it can *waste* space. (And I'm not convinced there can't be a pathological case where it is slower, too.) And the gains are dependent on the ability to *avoid* ultimately rendering the string; if every string ends up being rendered, there is no net gain in space, and there might be no net gain in time either (at least not for slices). I believe I would rather not pursue this patch further at this time; a far more important programming task is the str/unicode unification (now that the int/long unification is mostly there). If you want to clean up the patch, I suggest that you add a large comment section somewhere (unicode.h?) describing the algorithms in a lot of detail, including edge cases and performance analysis, to make review of the code possible. But you're most welcome to withdraw it, too; it would save me a lot of headaches. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 11:44 Message: Logged In: YES user_id=364875 Originator: YES Here's another possible fix for the worst-case scenario: #define MAX_SLICE_DELTA (64*1024) if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) || (size_of_slice > (size_of_original / 2)) ) use_lazy_slice(); else create_string_as_normal(); You'd still get the full benefit of lazy slices most of the time, but it takes the edge off the really pathological cases. How's that? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 10:42 Message: Logged In: YES user_id=364875 Originator: YES Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"*1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 23:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-13 00:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Mon Jan 15 19:54:10 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 15 Jan 2007 10:54:10 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-15 18:54 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-15 18:53 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 16:32 Message: Logged In: YES user_id=6380 Originator: NO Sorry, the test_array failure was due to not rebuilding after patching. Because extension modules are built using distutils, they don't get automatically rebuilt when a relevant header has changed. "grind to a halt": swapping, probably, due to memory filling up with 1M-character string objects, as you experienced yourself. Your proposal takes the edge off, although I can still come up with a worst-case scenario (just use 64K strings instead of 1M strings, and leave the rest the same). I am far from convinced that replacing one pathological case (O(N**2) concatenation, which is easily explained and avoided) with another (which is harder to explain due to the more complicated algorithms and heuristics involved) is a good trade-off. This is all the worse since your optimization doesn't have a clear time/space trade-off: it mostly attempts to preserve time *and* space, but in the worst case it can *waste* space. (And I'm not convinced there can't be a pathological case where it is slower, too.) And the gains are dependent on the ability to *avoid* ultimately rendering the string; if every string ends up being rendered, there is no net gain in space, and there might be no net gain in time either (at least not for slices). I believe I would rather not pursue this patch further at this time; a far more important programming task is the str/unicode unification (now that the int/long unification is mostly there). If you want to clean up the patch, I suggest that you add a large comment section somewhere (unicode.h?) describing the algorithms in a lot of detail, including edge cases and performance analysis, to make review of the code possible. But you're most welcome to withdraw it, too; it would save me a lot of headaches. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 11:44 Message: Logged In: YES user_id=364875 Originator: YES Here's another possible fix for the worst-case scenario: #define MAX_SLICE_DELTA (64*1024) if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) || (size_of_slice > (size_of_original / 2)) ) use_lazy_slice(); else create_string_as_normal(); You'd still get the full benefit of lazy slices most of the time, but it takes the edge off the really pathological cases. How's that? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 10:42 Message: Logged In: YES user_id=364875 Originator: YES Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"*1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 23:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-13 00:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Mon Jan 15 20:19:45 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 15 Jan 2007 11:19:45 -0800 Subject: [Patches] [ python-Patches-1635473 ] strptime %F and %T directives Message-ID: Patches item #1635473, was opened at 2007-01-14 16:40 Message generated for change (Comment added) made by bcannon You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 >Status: Closed >Resolution: Rejected Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: strptime %F and %T directives Initial Comment: In response to bug 1633628. %F and %T are valid directives. These are added to Lib/_strptime.py via adding the Y-M-d H:M:S directives in sub-expressions. Includes a test case. ---------------------------------------------------------------------- >Comment By: Brett Cannon (bcannon) Date: 2007-01-15 11:19 Message: Logged In: YES user_id=357491 Originator: NO Thanks for the work, Mark, but I am going to have to reject this patch. The F and T directives are not supported necessarily on every platform (at least to my knowledge) which is why they are not documented. Because of this I don't want to add support for them to strptime and have to start maintaining directives that are not documented. ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-14 17:05 Message: Logged In: YES user_id=1591633 Originator: YES I took a look on the time documentation page, and it did not detail %F and %T, even though they were supported in strftime. I added them to the documentation page since strptime now supports them. File Added: bug_1633628_strptime_doc.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635473&group_id=5470 From noreply at sourceforge.net Tue Jan 16 01:02:35 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 15 Jan 2007 16:02:35 -0800 Subject: [Patches] [ python-Patches-1037516 ] ftplib PASV error bug Message-ID: Patches item #1037516, was opened at 2004-09-30 15:35 Message generated for change (Comment added) made by wayland You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1037516&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tim Nelson (wayland) Assigned to: Nobody/Anonymous (nobody) Summary: ftplib PASV error bug Initial Comment: Hi. If ftplib gets an error while doing the PASV section of the ntransfercmd it dies. I've altered it so that ntransfercmd does an autodetect, if an autodetect hasn't been done yet. If there are any problems (as I'm not a python programmer :) ), please either fix them or let me know. ---------------------------------------------------------------------- >Comment By: Tim Nelson (wayland) Date: 2007-01-16 11:02 Message: Logged In: YES user_id=401793 Originator: YES Oops. I probably did, but I don't work in that job any more, so I'm afraid I don't have access to it. Sorry. You should, however, be able to correct it from the description. ---------------------------------------------------------------------- Comment By: Andrew Bennetts (spiv) Date: 2004-10-06 20:49 Message: Logged In: YES user_id=50945 Did you mean to submit a patch with this bug report? It sounds like you did, but there's no files attached to this bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1037516&group_id=5470 From noreply at sourceforge.net Tue Jan 16 16:33:59 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 07:33:59 -0800 Subject: [Patches] [ python-Patches-1636874 ] File Read/Write Flushing Patch Message-ID: Patches item #1636874, was opened at 2007-01-16 15:33 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1636874&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Windows Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: jurojin (jurojin) Assigned to: Nobody/Anonymous (nobody) Summary: File Read/Write Flushing Patch Initial Comment: The other night i was watching a google techtalk about python 3000 and Guido mentioned some problems with the C standard io library. In particular he highlighted an issue with switching between reading and writing without flushing and the fact that it caused serious errors. Not that i dont think its a good idea to write a new io library, but I wondered if it was the same problem ive encounted. It only happens on windows that i know off, but the fix is simple... Assuming you have a hanlde to the file called "Handle" and a Flush() method, the following logic for read and write will allow you to detect and prevent the problem. Add this to the Read() method before reading takes place: if ( Handle && (Handle->_flag & _IORW) && (Handle->_flag & (_IOREAD | _IOWRT)) == _IOWRT ) { Flush(); Handle->_flag |= _IOREAD; } Add this to the Write() method before writing takes place: if ( Handle && (Handle->_flag & _IORW) && (Handle->_flag & (_IOREAD | _IOWRT)) == _IOREAD ) { Flush(); Handle->_flag |= _IOWRT; } Emerson ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1636874&group_id=5470 From noreply at sourceforge.net Tue Jan 16 23:08:14 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 14:08:14 -0800 Subject: [Patches] [ python-Patches-1637157 ] urllib: change email.Utils -> email.utils Message-ID: Patches item #1637157, was opened at 2007-01-16 14:08 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Russell Owen (reowen) Assigned to: Nobody/Anonymous (nobody) Summary: urllib: change email.Utils -> email.utils Initial Comment: urllib uses the old name email.Utils instead of the new name email.Utils. This confuses py2app and possibly other packagers. Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470 From noreply at sourceforge.net Tue Jan 16 23:09:27 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 14:09:27 -0800 Subject: [Patches] [ python-Patches-1637159 ] urllib2: email.Utils->email.utils Message-ID: Patches item #1637159, was opened at 2007-01-16 14:09 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Russell Owen (reowen) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2: email.Utils->email.utils Initial Comment: urllib2 uses the old name email.Utils instead of the new name email.Utils. This may confuse py2app and/or other packagers. Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470 From noreply at sourceforge.net Tue Jan 16 23:11:16 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 14:11:16 -0800 Subject: [Patches] [ python-Patches-1637162 ] smtplib email renames Message-ID: Patches item #1637162, was opened at 2007-01-16 14:11 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Russell Owen (reowen) Assigned to: Nobody/Anonymous (nobody) Summary: smtplib email renames Initial Comment: smtplib uses the old names email.Utils and email.base64MIME instead of the new email.utils and email.base64mime. This may confuse py2app and/or other packagers. Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470 From noreply at sourceforge.net Wed Jan 17 07:55:45 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 22:55:45 -0800 Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing sys.stdout in sitecustomize Message-ID: Patches item #1630975, was opened at 2007-01-08 14:55 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: Thomas Wouters (twouters) >Assigned to: Thomas Wouters (twouters) Summary: Fix crash when replacing sys.stdout in sitecustomize Initial Comment: When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects. Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes. No idea how to test for this, from a unittest :P ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-16 22:55 Message: Logged In: YES user_id=33168 Originator: NO I can think of a nasty way to test this, but it's not really worth it. You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH and spawning a python. Ok, so it's not a real unit test, but it is a test. :-) This looks like it will also crash (before and after the patch) if sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). sysmodule.c looks fine. I think this is fine for 2.5.1. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 From noreply at sourceforge.net Wed Jan 17 07:56:35 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 22:56:35 -0800 Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing sys.stdout in sitecustomize Message-ID: Patches item #1630975, was opened at 2007-01-08 14:55 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Thomas Wouters (twouters) Summary: Fix crash when replacing sys.stdout in sitecustomize Initial Comment: When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects. Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes. No idea how to test for this, from a unittest :P ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-16 22:56 Message: Logged In: YES user_id=33168 Originator: NO Forgot to mention that I agree about the warning. If no one noticed so far, this is such an obscure case, it's not that important to warn. Either way is fine with me. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-16 22:55 Message: Logged In: YES user_id=33168 Originator: NO I can think of a nasty way to test this, but it's not really worth it. You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH and spawning a python. Ok, so it's not a real unit test, but it is a test. :-) This looks like it will also crash (before and after the patch) if sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). sysmodule.c looks fine. I think this is fine for 2.5.1. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 From noreply at sourceforge.net Wed Jan 17 08:09:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 23:09:11 -0800 Subject: [Patches] [ python-Patches-1610795 ] BSD version of ctypes.util.find_library Message-ID: Patches item #1610795, was opened at 2006-12-07 05:29 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: Martin Kammerhofer (mkam) >Assigned to: Thomas Heller (theller) Summary: BSD version of ctypes.util.find_library Initial Comment: The ctypes.util.find_library function for Posix systems is actually tailored for Linux systems. While the _findlib_gcc function relies only on the GNU compiler and may therefore work on any system with the "gcc" command in PATH, the _findLib_ld function relies on the /sbin/ldconfig command (originating from SunOS 4.0) which is not standardized. The version from GNU libc differs in option syntax and output format from other ldconfig programs around. I therefore provide a patch that enables find_library to properly communicate with the ldconfig program on FreeBSD systems. It has been tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD systems too. (It works without this patch on FreeBSD, because after getting an error from ldconfig it falls back to _findlib_gcc.) While at it I also tidied up the Linux specific code: I'm escaping the function argument before interpolating it into a regular expression (to protect against nasty regexps) and removed the code for creation of a temporary file that was not used in any way. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-16 23:09 Message: Logged In: YES user_id=33168 Originator: NO Thomas, I don't see any (public) API changes and this fixes a bug. I don't see a reason not to fix this in 2.5.1. If you are comfortable with fixing, apply the patch. Also, please update Misc/NEWS. Thanks! ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-12 12:21 Message: Logged In: YES user_id=11105 Originator: NO Committed into trunk as revision 53402. Thanks for the patch and the work on it. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-12 12:11 Message: Logged In: YES user_id=11105 Originator: NO Neal, I think this can go into the release25-maint branch since it repairs the ctypes.util.find_library function on BSD systems. What do you think? ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2007-01-10 03:58 Message: Logged In: YES user_id=1656067 Originator: YES The output looks good. The patch selects the numerically highest library version. NetBSD is not handled by the patch but works through _findLib_gcc (which will also be tried as a fallback strategy for Free/Open-BSD when ldconfig output parsing fails.) I think the patch is ready for commit. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-09 12:01 Message: Logged In: YES user_id=11105 Originator: NO mkam, I was eventually able to test out your patch. I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9. The output from "print find_library('c'), find_library('m')" on these systems is as follows: FreeBSD6.0: libc.so.6, libm.so.4 NetBSD3.0: libc.so.12, libm.so.0 OpenBSD3.9: libc.so.39.0, libm.so.2.1 If you think this is what is expected, I'm happy to apply the patch. Or is there further work needed on it? (Do you still need the output of "ldconfig -r" or whatever?) ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 10:43 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-12 03:28 Message: Logged In: YES user_id=1656067 Originator: YES Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine, FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions on how many version numbers are appended to a library name any more. Even mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to different.) File Added: ctypes-util.py.patch ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-11 02:10 Message: Logged In: YES user_id=1656067 Originator: YES Hm, I did not know that OpenBSD is still using two version numbers for shared library. (I conclude that from the "libc.so.39.0" in the previous followup. Btw FreeBSD has used a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout executable format.) Unfortunately my freebsd patch has the assumption of a single version number built in; more specifically the cmp(* map(lambda x: int(x.split('.')[-1]), (a, b))) is supposed to sort based an the last dot separated field. I guess that OpenBSD system does not have another libc, at least none with a minor > 0. ;-) Thomas, can you mail me the output of "ldconfig -r"? I will refine the patch then, doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/ fields. Said output from NetBSD welcome too. DragonflyBSD should be no problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform like? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-08 12:32 Message: Logged In: YES user_id=11105 Originator: NO I have tested the patch on FreeBSD 6.0 and (after extending the check to test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works fine. find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0 in OpenBSD 3.9, while it returned 'None' before on both machines. ---------------------------------------------------------------------- Comment By: David Remahl (chmod007) Date: 2006-12-07 23:50 Message: Logged In: YES user_id=2135 Originator: NO # Does this work (without the gcc fallback) on other *BSD systems too? I don't know, but it doesn't work on Darwin (which already has a custom method through macholib). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-07 13:11 Message: Logged In: YES user_id=11105 Originator: NO Will do (although I would appreciate review from others too; I'm not exactly a BSD expert). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-07 11:15 Message: Logged In: YES user_id=21627 Originator: NO Thomas, can you take a look? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 From noreply at sourceforge.net Wed Jan 17 08:42:42 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 16 Jan 2007 23:42:42 -0800 Subject: [Patches] [ python-Patches-1633807 ] from __future__ import print_function Message-ID: Patches item #1633807, was opened at 2007-01-11 23:13 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) >Assigned to: Guido van Rossum (gvanrossum) Summary: from __future__ import print_function Initial Comment: This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!) The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0 Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-16 23:42 Message: Logged In: YES user_id=33168 Originator: NO Guido, this is the patch I was talking about wrt supporting a print function in 2.6. exec could get similar treatment. You mentioned in mail that things like except E as V: can go in without a future stmt. I agree. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-11 23:31 Message: Logged In: YES user_id=29957 Originator: YES Updated version of patch - fixes interactive mode, adds builtins.print File Added: print_function.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 From noreply at sourceforge.net Wed Jan 17 16:24:27 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 17 Jan 2007 07:24:27 -0800 Subject: [Patches] [ python-Patches-1633807 ] from __future__ import print_function Message-ID: Patches item #1633807, was opened at 2007-01-12 02:13 Message generated for change (Comment added) made by gvanrossum You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) >Assigned to: Nobody/Anonymous (nobody) Summary: from __future__ import print_function Initial Comment: This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!) The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0 Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. ---------------------------------------------------------------------- >Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-17 10:24 Message: Logged In: YES user_id=6380 Originator: NO I don't think we need to do anything special for exec, as the exec(s, locals, globals) syntax is already (still :-) supported in 2.x with identical semantics as in 3.0. except E as V *syntax* can go in without a future stmt; and (only when that syntax is used) it should also enforce the new semantics (V must be a simple name and is deleted at the end of the except clause). I think Anthony's patch is a great idea, but I'll refrain from reviewing it. I'd say "just do it". :-) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 02:42 Message: Logged In: YES user_id=33168 Originator: NO Guido, this is the patch I was talking about wrt supporting a print function in 2.6. exec could get similar treatment. You mentioned in mail that things like except E as V: can go in without a future stmt. I agree. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-12 02:31 Message: Logged In: YES user_id=29957 Originator: YES Updated version of patch - fixes interactive mode, adds builtins.print File Added: print_function.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 From noreply at sourceforge.net Wed Jan 17 16:58:31 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 17 Jan 2007 07:58:31 -0800 Subject: [Patches] [ python-Patches-1633807 ] from __future__ import print_function Message-ID: Patches item #1633807, was opened at 2007-01-12 08:13 Message generated for change (Comment added) made by twouters You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Nobody/Anonymous (nobody) Summary: from __future__ import print_function Initial Comment: This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!) The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0 Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2007-01-17 16:58 Message: Logged In: YES user_id=34209 Originator: NO You seem to have '#if 0'ed-out some code related to the with/as-statement warnings; I suggest just removing them. Since you're in this code now, it might make sense to provide a commented out warning about the use of the print statement, so we won't have to figure it out later (in Python 2.9 or when we add -Wp3yk.) It needs a test, and probably a doc change somewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-17 16:24 Message: Logged In: YES user_id=6380 Originator: NO I don't think we need to do anything special for exec, as the exec(s, locals, globals) syntax is already (still :-) supported in 2.x with identical semantics as in 3.0. except E as V *syntax* can go in without a future stmt; and (only when that syntax is used) it should also enforce the new semantics (V must be a simple name and is deleted at the end of the except clause). I think Anthony's patch is a great idea, but I'll refrain from reviewing it. I'd say "just do it". :-) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 08:42 Message: Logged In: YES user_id=33168 Originator: NO Guido, this is the patch I was talking about wrt supporting a print function in 2.6. exec could get similar treatment. You mentioned in mail that things like except E as V: can go in without a future stmt. I agree. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-12 08:31 Message: Logged In: YES user_id=29957 Originator: YES Updated version of patch - fixes interactive mode, adds builtins.print File Added: print_function.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 From noreply at sourceforge.net Wed Jan 17 20:59:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 17 Jan 2007 11:59:11 -0800 Subject: [Patches] [ python-Patches-1610795 ] BSD version of ctypes.util.find_library Message-ID: Patches item #1610795, was opened at 2006-12-07 14:29 Message generated for change (Comment added) made by theller You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) >Group: Python 2.5 >Status: Closed >Resolution: Fixed Priority: 9 Private: No Submitted By: Martin Kammerhofer (mkam) Assigned to: Thomas Heller (theller) Summary: BSD version of ctypes.util.find_library Initial Comment: The ctypes.util.find_library function for Posix systems is actually tailored for Linux systems. While the _findlib_gcc function relies only on the GNU compiler and may therefore work on any system with the "gcc" command in PATH, the _findLib_ld function relies on the /sbin/ldconfig command (originating from SunOS 4.0) which is not standardized. The version from GNU libc differs in option syntax and output format from other ldconfig programs around. I therefore provide a patch that enables find_library to properly communicate with the ldconfig program on FreeBSD systems. It has been tested on FreeBSD 4.11 and 6.2. It probably works on other *BSD systems too. (It works without this patch on FreeBSD, because after getting an error from ldconfig it falls back to _findlib_gcc.) While at it I also tidied up the Linux specific code: I'm escaping the function argument before interpolating it into a regular expression (to protect against nasty regexps) and removed the code for creation of a temporary file that was not used in any way. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2007-01-17 20:59 Message: Logged In: YES user_id=11105 Originator: NO Thanks, Neal, and Martin, again. Committed as r53471 (and r53473 for Misc/NEWS) in the release25-maint branch. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 08:09 Message: Logged In: YES user_id=33168 Originator: NO Thomas, I don't see any (public) API changes and this fixes a bug. I don't see a reason not to fix this in 2.5.1. If you are comfortable with fixing, apply the patch. Also, please update Misc/NEWS. Thanks! ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-12 21:21 Message: Logged In: YES user_id=11105 Originator: NO Committed into trunk as revision 53402. Thanks for the patch and the work on it. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-12 21:11 Message: Logged In: YES user_id=11105 Originator: NO Neal, I think this can go into the release25-maint branch since it repairs the ctypes.util.find_library function on BSD systems. What do you think? ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2007-01-10 12:58 Message: Logged In: YES user_id=1656067 Originator: YES The output looks good. The patch selects the numerically highest library version. NetBSD is not handled by the patch but works through _findLib_gcc (which will also be tried as a fallback strategy for Free/Open-BSD when ldconfig output parsing fails.) I think the patch is ready for commit. ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2007-01-09 21:01 Message: Logged In: YES user_id=11105 Originator: NO mkam, I was eventually able to test out your patch. I have virtual machines running Freebsd6.0, NetBSD3.0, and OpenBSD3.9. The output from "print find_library('c'), find_library('m')" on these systems is as follows: FreeBSD6.0: libc.so.6, libm.so.4 NetBSD3.0: libc.so.12, libm.so.0 OpenBSD3.9: libc.so.39.0, libm.so.2.1 If you think this is what is expected, I'm happy to apply the patch. Or is there further work needed on it? (Do you still need the output of "ldconfig -r" or whatever?) ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-20 19:43 Message: Logged In: YES user_id=11105 Originator: NO Unfortunately I'm unable to review or work on this patch *this year*. I will definitely take a look in January. Sorry. ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-12 12:28 Message: Logged In: YES user_id=1656067 Originator: YES Here is the revised patch. Tested on a (virtual) OpenBSD 3.9 machine, FreeBSD 4.11, FreeBSD 6.2 and DragonFlyBSD 1.6. Does not make assumptions on how many version numbers are appended to a library name any more. Even mixed length names (e.g. libfoo.so.8.9 vs. libfoo.so.10) compare in a meaningful way. (BTW: I also tried NetBSD 2.0.2, but its ldconfig is to different.) File Added: ctypes-util.py.patch ---------------------------------------------------------------------- Comment By: Martin Kammerhofer (mkam) Date: 2006-12-11 11:10 Message: Logged In: YES user_id=1656067 Originator: YES Hm, I did not know that OpenBSD is still using two version numbers for shared library. (I conclude that from the "libc.so.39.0" in the previous followup. Btw FreeBSD has used a MAJOR.MINOR[.DEWEY] scheme during the ancient days of the aout executable format.) Unfortunately my freebsd patch has the assumption of a single version number built in; more specifically the cmp(* map(lambda x: int(x.split('.')[-1]), (a, b))) is supposed to sort based an the last dot separated field. I guess that OpenBSD system does not have another libc, at least none with a minor > 0. ;-) Thomas, can you mail me the output of "ldconfig -r"? I will refine the patch then, doing a more general sort algorithm; i.e. sort by all trailing /(\.\d+)+/ fields. Said output from NetBSD welcome too. DragonflyBSD should be no problem since it is a fork of FreeBSD 4.8, but what looks its sys.platform like? ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-08 21:32 Message: Logged In: YES user_id=11105 Originator: NO I have tested the patch on FreeBSD 6.0 and (after extending the check to test for sys.platform.startswith("openbsd")) on OpenBSD 3.9 and it works fine. find_library("c") now returns libc.so.6 on FreeBSD 6.0, and libc.so.39.0 in OpenBSD 3.9, while it returned 'None' before on both machines. ---------------------------------------------------------------------- Comment By: David Remahl (chmod007) Date: 2006-12-08 08:50 Message: Logged In: YES user_id=2135 Originator: NO # Does this work (without the gcc fallback) on other *BSD systems too? I don't know, but it doesn't work on Darwin (which already has a custom method through macholib). ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-07 22:11 Message: Logged In: YES user_id=11105 Originator: NO Will do (although I would appreciate review from others too; I'm not exactly a BSD expert). ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-07 20:15 Message: Logged In: YES user_id=21627 Originator: NO Thomas, can you take a look? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610795&group_id=5470 From noreply at sourceforge.net Wed Jan 17 21:07:38 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 17 Jan 2007 12:07:38 -0800 Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module Message-ID: Patches item #1638033, was opened at 2007-01-17 21:07 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Arvin Schnell (arvins) Assigned to: Nobody/Anonymous (nobody) Summary: Add httponly to Cookie module Initial Comment: Add the Microsoft extension httponly to the Cookie module. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 From noreply at sourceforge.net Thu Jan 18 04:52:36 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 17 Jan 2007 19:52:36 -0800 Subject: [Patches] [ python-Patches-1638243 ] compiler.pycodegen causes crashes when compiling 'with' Message-ID: Patches item #1638243, was opened at 2007-01-17 22:52 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Parser/Compiler Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: kirat (kirat) Assigned to: Nobody/Anonymous (nobody) Summary: compiler.pycodegen causes crashes when compiling 'with' Initial Comment: The compiler package in the python library is missing a LOAD/DELETE just before the WITH_CLEANUP instruction. Also transformer isn't creating the with_var as an assignment. So the following little code snippet will crash if you compile and run it with compiler.compile() class TrivialContext: def __enter__(self): return self def __exit__(self,*exc_info): pass def f(): with TrivialContext() as tc: return 1 f() The fix is just a few lines. I'm enclosing a patch against the python 2.5 source. I've also added the above as a test case to the test_compiler.py file. regards, -Kirat ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470 From noreply at sourceforge.net Thu Jan 18 20:03:34 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 18 Jan 2007 11:03:34 -0800 Subject: [Patches] [ python-Patches-1638879 ] Fix to the long("123\0", 10) problem Message-ID: Patches item #1638879, was opened at 2007-01-18 14:03 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Calvin Spealman (ironfroggy) Assigned to: Nobody/Anonymous (nobody) Summary: Fix to the long("123\0", 10) problem Initial Comment: This is a simple patch adapted from the int_new function to the long_new function. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470 From noreply at sourceforge.net Fri Jan 19 16:06:27 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 19 Jan 2007 07:06:27 -0800 Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module Message-ID: Patches item #1638033, was opened at 2007-01-17 15:07 Message generated for change (Comment added) made by jimjjewett You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Arvin Schnell (arvins) Assigned to: Nobody/Anonymous (nobody) Summary: Add httponly to Cookie module Initial Comment: Add the Microsoft extension httponly to the Cookie module. ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-19 10:06 Message: Logged In: YES user_id=764593 Originator: NO The documentation change should say what the attribute does. (It requests the the cookie be hidden from javascript, and available only to http requests.) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 From noreply at sourceforge.net Fri Jan 19 18:01:21 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 19 Jan 2007 09:01:21 -0800 Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module Message-ID: Patches item #1638033, was opened at 2007-01-17 21:07 Message generated for change (Comment added) made by arvins You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Arvin Schnell (arvins) Assigned to: Nobody/Anonymous (nobody) Summary: Add httponly to Cookie module Initial Comment: Add the Microsoft extension httponly to the Cookie module. ---------------------------------------------------------------------- >Comment By: Arvin Schnell (arvins) Date: 2007-01-19 18:01 Message: Logged In: YES user_id=698939 Originator: YES Sure, I have added some documentation to the patch. File Added: python.diff ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-19 16:06 Message: Logged In: YES user_id=764593 Originator: NO The documentation change should say what the attribute does. (It requests the the cookie be hidden from javascript, and available only to http requests.) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 From noreply at sourceforge.net Sat Jan 20 02:15:21 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 19 Jan 2007 17:15:21 -0800 Subject: [Patches] [ python-Patches-1639973 ] email.utils.parsedate documentation Message-ID: Patches item #1639973, was opened at 2007-01-19 19:15 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1639973&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: email.utils.parsedate documentation Initial Comment: See bug 1629566 (python.org/sf/1629566) for discussion. This patch eliminates any ambiguity in the documentation regarding which fields of the time tuple it refers to. This patch specifies the documentation in both librfc822.tex and emailutil.tex ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1639973&group_id=5470 From noreply at sourceforge.net Sat Jan 20 03:44:33 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 19 Jan 2007 18:44:33 -0800 Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does not close sockets properly) Message-ID: Patches item #1627441, was opened at 2007-01-03 17:46 Message generated for change (Comment added) made by mark-roberts You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for #1601399 (urllib2 does not close sockets properly) Initial Comment: Fix for #1601399 Definitely a backport candidate. ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-19 20:44 Message: Logged In: YES user_id=1591633 Originator: NO Patch looks good to me, and the tests still pass. If it matters, I would like to see a test case presented in the patch as well. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 From noreply at sourceforge.net Sat Jan 20 10:16:44 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 20 Jan 2007 01:16:44 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-20 09:16 Message: Logged In: YES user_id=364875 Originator: YES Whoops, sorry. I refreshed a summary page I had lying around, which I guess re-posted the comment! Didn't mean to spam you with extra updates. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-20 09:14 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-15 18:54 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-15 18:53 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 16:32 Message: Logged In: YES user_id=6380 Originator: NO Sorry, the test_array failure was due to not rebuilding after patching. Because extension modules are built using distutils, they don't get automatically rebuilt when a relevant header has changed. "grind to a halt": swapping, probably, due to memory filling up with 1M-character string objects, as you experienced yourself. Your proposal takes the edge off, although I can still come up with a worst-case scenario (just use 64K strings instead of 1M strings, and leave the rest the same). I am far from convinced that replacing one pathological case (O(N**2) concatenation, which is easily explained and avoided) with another (which is harder to explain due to the more complicated algorithms and heuristics involved) is a good trade-off. This is all the worse since your optimization doesn't have a clear time/space trade-off: it mostly attempts to preserve time *and* space, but in the worst case it can *waste* space. (And I'm not convinced there can't be a pathological case where it is slower, too.) And the gains are dependent on the ability to *avoid* ultimately rendering the string; if every string ends up being rendered, there is no net gain in space, and there might be no net gain in time either (at least not for slices). I believe I would rather not pursue this patch further at this time; a far more important programming task is the str/unicode unification (now that the int/long unification is mostly there). If you want to clean up the patch, I suggest that you add a large comment section somewhere (unicode.h?) describing the algorithms in a lot of detail, including edge cases and performance analysis, to make review of the code possible. But you're most welcome to withdraw it, too; it would save me a lot of headaches. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 11:44 Message: Logged In: YES user_id=364875 Originator: YES Here's another possible fix for the worst-case scenario: #define MAX_SLICE_DELTA (64*1024) if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) || (size_of_slice > (size_of_original / 2)) ) use_lazy_slice(); else create_string_as_normal(); You'd still get the full benefit of lazy slices most of the time, but it takes the edge off the really pathological cases. How's that? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 10:42 Message: Logged In: YES user_id=364875 Originator: YES Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"*1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 23:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-13 00:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sat Jan 20 10:14:41 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 20 Jan 2007 01:14:41 -0800 Subject: [Patches] [ python-Patches-1629305 ] The Unicode "lazy strings" patches Message-ID: Patches item #1629305, was opened at 2007-01-06 09:37 Message generated for change (Comment added) made by lhastings You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 3000 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Larry Hastings (lhastings) Assigned to: Nobody/Anonymous (nobody) Summary: The Unicode "lazy strings" patches Initial Comment: These are patches to add lazy processing to Unicode strings for Python 3000. I plan to post separate patches for both "lazy concatenation" and "lazy slices", as I suspect "lazy concatenation" has a much higher chance of being accepted. There is a long discussion about "lazy concatenation" here: http://mail.python.org/pipermail/python-dev/2006-October/069224.html And another long discussion about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html Note that, unlike the 8-bit-character strings patches, I don't expect the "lazy slices" patch to be dependent on the "lazy concatenation" patch. Unicode objects are stored differently, and already use a pointer to a separately-allocated buffer. This was the big (and mildly controversial) change made by the 8-bit-character "lazy concatenation" patch, and "lazy slices" needed it too. Since Unicode objects already look like that, the Unicode lazy patches should be independent. ---------------------------------------------------------------------- >Comment By: Larry Hastings (lhastings) Date: 2007-01-20 09:14 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-15 18:54 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-15 18:53 Message: Logged In: YES user_id=364875 Originator: YES As discussed (briefly) over email, I'm moving this discussion back to the Python-3000 mailing list. But before I do I wanted to clear up something from your reply. "lazy concatenation" and "lazy slices" are really two patches, filed under the "lazy slices" penumbra. They are different optimizations, with different implementations and different behaviors. I implemented them cumulatively to save work because they intertwine when merged, but I had hoped they would be considered independently. I apologize if this point was unclear (and moreso if it was a bad idea). My reason for doing so: I suspected "lazy slices" were doomed from the start; doing the patch this way meant wasting less work. One downside of "lazy slices" is their ability to waste loads of memory in the worst-case. Now, "lazy concatenation" simply doesn't have that problem. Yet the fourth and fifth paragraphs of your most recent reply imply you think it can. A quick recap of lazy concatenation: a = u"a" b = u"b" concat = a + b "concat" is a PyUnicodeConcatenationObject holding references to a and b (or rather their values). Its "value" is NULL, indicating that it is unrendered. The moment someone asks for the value of "concat", the object allocates space for its value, constructs the value by walking its tree of children, and frees its children. The implementation is heavily optimized for the general case (concatenation) and avoids recursion where possible. The worst-case memory consumption behavior of lazy concatenation is adding lots and lots of tiny strings and never rendering; that will allocate lots of PyUnicodeConcatenationObjects. But it's nowhere near as bad as a short lazy slice of a long string. Does that make "lazy concatenation" more palatable? ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-14 16:32 Message: Logged In: YES user_id=6380 Originator: NO Sorry, the test_array failure was due to not rebuilding after patching. Because extension modules are built using distutils, they don't get automatically rebuilt when a relevant header has changed. "grind to a halt": swapping, probably, due to memory filling up with 1M-character string objects, as you experienced yourself. Your proposal takes the edge off, although I can still come up with a worst-case scenario (just use 64K strings instead of 1M strings, and leave the rest the same). I am far from convinced that replacing one pathological case (O(N**2) concatenation, which is easily explained and avoided) with another (which is harder to explain due to the more complicated algorithms and heuristics involved) is a good trade-off. This is all the worse since your optimization doesn't have a clear time/space trade-off: it mostly attempts to preserve time *and* space, but in the worst case it can *waste* space. (And I'm not convinced there can't be a pathological case where it is slower, too.) And the gains are dependent on the ability to *avoid* ultimately rendering the string; if every string ends up being rendered, there is no net gain in space, and there might be no net gain in time either (at least not for slices). I believe I would rather not pursue this patch further at this time; a far more important programming task is the str/unicode unification (now that the int/long unification is mostly there). If you want to clean up the patch, I suggest that you add a large comment section somewhere (unicode.h?) describing the algorithms in a lot of detail, including edge cases and performance analysis, to make review of the code possible. But you're most welcome to withdraw it, too; it would save me a lot of headaches. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 11:44 Message: Logged In: YES user_id=364875 Originator: YES Here's another possible fix for the worst-case scenario: #define MAX_SLICE_DELTA (64*1024) if ( ((size_of_slice + MAX_SLICE_DELTA) > size_of_original) || (size_of_slice > (size_of_original / 2)) ) use_lazy_slice(); else create_string_as_normal(); You'd still get the full benefit of lazy slices most of the time, but it takes the edge off the really pathological cases. How's that? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-14 10:42 Message: Logged In: YES user_id=364875 Originator: YES Thanks for taking the time! > - Style: you set your tab stops to 4 spaces. That is an absolute > no-no! Sorry about that; I'll fix it if I resubmit. > - Segfault in test_array. It seems that it's receiving a unicode > slice object and treating it like a "classic" unicode object. I tested on Windows and Linux, and I haven't seen that behavior. Which test_array, by the way? In Lib/test, or Lib/ctypes/test? I'm having trouble with most of the DLL extensions on Windows; they complain that the module uses the incompatible python26.dll or python26_d.dll. So I haven't tested ctypes/test_array.py on Windows, but I have tested the other three permutations of Linux vs Windows and Lib/test/test_array vs Lib/ctypes/test/test_array. Can you give me a stack trace to the segfault? With that I bet I can fix it even without a reproducible test case. > - I got it to come to a grinding halt with the following worst-case > scenario: > > a = [] > while True: > x = u"x"*1000000 > x = x[30:60] # Short slice of long string > a.append(x) > > If you can't do better than that, I'll have to reject it. > > PS I used your combined patch, if it matters. It matters. The combined patch has "lazy slices", the other patch does not. When you say "grind to a halt" I'm not sure what you mean. Was it thrashing? How much CPU was it using? When I ran that test, my Windows computer got to 1035 iterations then threw a MemoryError. My Linux box behaved the same, except it got to 1605 iterations. Adding a call to .simplify() on the slice defeats this worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60].simplify() # Short slice of long string a.append(x) .simplify() forces lazy strings to render themselves. With that change, this test will run until the cows come home. Is that acceptable? Failing that, is there any sort of last-ditch garbage collection pass that gets called when a memory allocation fails but before it returns NULL? If so, I could hook in to that and try to render some slices. (I don't see such a pass, but maybe I missed it.) Failing that, I could add garbage-collect-and-retry-once logic to memory allocation myself, either just for unicodeobject.c or as a global change. But I'd be shocked if you were interested in that approach; if Python doesn't have such a thing by now, you probably don't want it. And failing that, "lazy slices" are probably toast. It always was a tradeoff of speed for worst-case memory use, and I always knew it might not fly. If that's the case, please take a look at the other patch, and in the meantime I'll see if anyone can come up with other ways to mitigate the worst-case scenario. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-13 23:59 Message: Logged In: YES user_id=6380 Originator: NO Problems so far: - Style: you set your tab stops to 4 spaces. That is an absolute no-no! You can indent using 4 spaces, but you should NEVER assume that a TAB character is anything except 8 spaces. - Segfault in test_array. It seems that it's receiving a unicode slice object and treating it like a "classic" unicode object. - I got it to come to a grinding halt with the following worst-case scenario: a = [] while True: x = u"x"*1000000 x = x[30:60] # Short slice of long string a.append(x) If you can't do better than that, I'll have to reject it. PS I used your combined patch, if it matters. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-13 00:03 Message: Logged In: YES user_id=364875 Originator: YES File Added: pybench.first.results.zip ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 17:57 Message: Logged In: YES user_id=364875 Originator: YES josiahcarlson: I think you misunderstood options 2 and 3. The empty string (option 2) or nonempty but fixed size string (option 3) would *only* be returned in the event of an allocation failure, aka "the process is out of memory". Since it's out of memory yet trying to allocate more, it has *already* failed. My goal in proposing options 2 and 3 was that, when this happens (and it eventually will), Python would fail *gracefully* with an exception, rather than *miserably* with a bus error. As for writing a wrapper, I'm just not interested. I'm a strong believer in "There should be one--and preferably only one--obvious way to do it", and I feel a special-purpose wrapper class for good string performance adds mental clutter. The obvious way to do string concatenation is with "+"; the obvious way to to string slices is with "[:]". My goal is to make those fast so that you can use them *everywhere*--even in performance-critical code. I don't want a wrapper class, and have no interest in contributing to one. For what it's worth, I came up with a fifth approach this morning while posting to the Python-3000 mailing list: pre-allocate the str buffer, updating it to the correct size whenever the lazy object changes size. That would certainly fix the problem; the error would occur in a much more reportable place. But it would also slow down the code quite a lot, negating many of the speed gains of this approach. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-12 06:55 Message: Logged In: YES user_id=341410 Originator: NO I don't think that changing the possible return of PyUnicode_AS_UNICODE is reasonable. (option 1) Option 2 breaks the buffer interface. Option 3 severely limits the size of potential unicode strings. If you are only manipulating tiny unicode strings (8k?), then the effect of fast concatenation, slicing, etc., isn't terribly significant. Option 4 is possible, but I know I would feel bad if all of this work went to waste. Note what M. A. Lemburg mentioned. The functionality is useful, it's the polymorphic representation that is the issue. Rather than attempting to change the unicode representation, what about a wrapper type? Keep the base unicode representation simple (both Guido and M. A. have talked about this). Guido has also stated that he wouldn't be against views (slicing and/or concatenation) if they could be shown to have real use-cases. The use-cases you have offered here are still applicable, and because it wouldn't necessitate a (not insignificant) change in semantics and 3rd party code, would make it acceptable. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:32 Message: Logged In: YES user_id=364875 Originator: YES Just fixed the build under Linux--sorry, should have done that before posting the original patch. Patches now built and tested under Win32 and Linux, and produce the same output as an unpatched py3k trunk. lemburg: A minor correction: the full "lazy strings" patch (with "lazy slices") also touches "stringlib/partition.h", "stringlib/readme.txt", and "Objects/stringobject.c", in addition to the two unicodeobject.* files. The changes to these three files are minuscule, and don't affect their maintainability, so the gist of my statements still hold. (Besides, all three of those files will probably go away before Py3k ships.) File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 04:25 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 03:12 Message: Logged In: YES user_id=364875 Originator: YES Attached below you will find the full "lazy strings" patch, which has both "lazy concatenation" and "lazy slices". The diff is against the current revision of the Py3k branch, #53392. On my machine (Win32) rt.bat produces identical output before and after the patch, for both debug and release builds. As I mentioned in a previous comment, you can read the description (and ensuing conversation) about "lazy slices" here: http://mail.python.org/pipermail/python-dev/2006-October/069506.html One new feature of this version: I added a method on a Unicode string, s.simplify(), which forces the string to "render" if it's one of my exotic string subtypes (a lazy concatenation or lazy slice). My goal is to assuage fears about pathological memory-use cases where you have long-lived tiny slices of gigantic strings. If you realize you're having that problem, simply add calls to .simplify() on the slices and the problem should go away. As for the semantics of .simplify(), it returns a reference to the string s. Honestly I wasn't sure whether it should return a new string or just monkey with the existing string. Really, rendering doesn't change the string; it's the same string, with the exact same external behavior, just with different bits floating around underneath. For now it monkeys with the existing string, as that seemed best. (But I'd be happy to switch it to returning a new string if it'd help.) I had planned to make the "lazy slices" patch independent of the "lazy concatenation" patch. However, it wound up being a bigger pain that I thought, and anyway I figure the likelyhood that "lazy slices" would be accepted and "lazy concatenation" would not is effectively zero. So I didn't bother. If there's genuine interest in "lazy slices" without "lazy concatenation", I can produce such a thing. File Added: lch.py3k.unicode.lazy.slice.and.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:50 Message: Logged In: YES user_id=364875 Originator: YES File Added: lch.py3k.unicode.lazy.concat.patch.53392.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-12 02:42 Message: Logged In: YES user_id=364875 Originator: YES lemburg: You're right, the possibility of PyUnicode_AS_UNICODE() returning NULL is new behavior, and this could conceivably result in crashes. To be clear: NULL return values will only happen when allocation of the final "str" buffer fails during lazy rendering. This will only happen in out-of-memory conditions; for right now, while the patch is under early review, I suspect that's okay. So far I've come up with four possible ways to resolve this problem, which I will list here from least-likely to most-likely: 1. Redefine the API such that PyUnicode_AS_UNICODE() is allowed to return NULL, and fix every place in the Python source tree that calls it to check for a NULL return. Document this with strong language for external C module authors. 2. Change the length to 0 and return a constant empty string. Suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 3. Change the length to 0 and return a previously-allocated buffer of some hopefully-big-enough-size (4096 bytes? 8192 bytes?), such that even if the caller iterates over the buffer, odds are good they'll stop before they hit the end. Again, suggest that users of the Unicode API ask for the pointer *first* and the length *second*. 4. The patch is not accepted. Of course, I'm open to suggestions of other approaches. (Not to mention patches!) Regarding your memory usage and "slice integers" comments, perhaps you'll be interested in the full lazy patch, which I hope to post later today. "Lazy concatenation" is only one of the features of the full patch; the other is "lazy slices". For a full description of my "lazy slices" implementation, see this posting (and the subsequent conversation) to Python-Dev: http://mail.python.org/pipermail/python-dev/2006-October/069506.html And yes, lazy slices suffer from the same possible-NULL-return-from-PyUnicode_AS_UNICODE() problem that lazy concatenation does. As for your final statement, I never claimed that this was a particularly clean design. I merely claim it makes things faster and is (so far) self-contained. For the Unicode versions of my lazy strings patches, the only files I touched were "Include/unicodeobject.h" and "Objects/unicodeobject.c". I freely admit my patch makes those files *even fussier* to work on than they already are. But if you don't touch those files, you won't notice the difference*, and the patch makes some Python string operations faster without making anything else slower. At the very least I suggest the patches are worthy of examination. * Barring API changes to rectify the possible NULL return from PyUnicode_AS_UNICODE() problem, that is. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-10 20:59 Message: Logged In: YES user_id=38388 Originator: NO Larry, I probably wasn't clear enough: PyUnicode_AS_UNICODE() returns a pointer to the underlying Py_UNICODE buffer. No API using this macro checks for a NULL return value of the macro since a Unicode object is guaranteed to have a non-NULL Py_UNICODE buffer. As a result, a memory caused during the concatenation process cannot be passed back up the call stack. The NULL return value would result in a plain segfault in the calling API. Regarding the tradeoff and trying such an approach: I've done such tests myself (not with Unicode but with 8-bit strings) and it didn't pay off. The memory consumption outweighs the performance you gain by using the 'x += y' approach. The ''.join(list) approach also doesn't really help if you're after performance (for much the same reasons). In mxTextTools I used slice integers pointing into the original parsed string to work around these problems, which works great and avoids creating short strings altogether (so you gain speed and memory). A patch I would find a lot more useful is one to create a Unicode alternative to cStringIO - for strings, this is by far the most performant way of creating a larger string from lots of small pieces. To complement this, a smart slice type might also be an attractive target; one that breaks up a larger string into slices and provides operations on these, including joining them to form a new string. I'm not convinced that murking with the underlying object type and doing "subtyping" on-the-fly is a clean design. ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-10 20:30 Message: Logged In: YES user_id=364875 Originator: YES Much of what I do in Python is text processing. My largest Python project to date was an IDL which spewed out loads of text; I've also written an HTML formatter or two. I seem to do an awful lot of string concatenation in Python, and I'd like it to be fast. I'm not alone in this, as there have been several patches to Python in recent years to speed up string concatenation. Perhaps you aren't familiar with my original justification for the patch. I've always hated the "".join() idiom for string concatenation, as it violates the "There should be one--and preferably only one--obvious way to do it" principle (and arguably others). With lazy concatenation, the obvious way (using +) becomes competitive with "".join(), thus dispensing with the need for this inobvious and distracting idiom. For a more thorough dissection of the (original) patch, including its implementation and lots of discussion from other people, please see the original thread on c.l.p: http://groups.google.com/group/comp.lang.python/browse_frm/thread/b8a8f20bc3c81bcf Please ignore the benchmarks there, as they were quite flawed. And, no, I haven't seen a lot of code manipulating Unicode strings yet, but then I'm not a Python shaker-and-mover. Obviously I expect to see a whole lot more when Py3k is adopted. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-10 18:24 Message: Logged In: YES user_id=341410 Originator: NO >From what I understand, the point of the lazy strings patch is to make certain operations faster. What operations? Generally speaking, looped concatenation (x += y), and other looping operations that have traditionally been slow; O(n^2). While this error is still common among new users of Python, generally users only get bit once. They ask about it on python-list and are told: z = []; z.append(y); x = ''.join(z) . Then again, the only place where I've seen the iterative building up of *text* is really in document reformatting (like textwrap). Basically all other use-cases (that I have seen) generally involve the manipulation of binary data. Larry, out of curiosity, have you found code out there that currently loops and concatenates unicode? ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:26 Message: Logged In: YES user_id=364875 Originator: YES Continuing the comedy of errors, concat patch #2 was actually the same as #1, it didn't have the fix for detecting a NULL return of PyMem_NEW(). Fixed in concat patch #3. (Deleting concat patch #2.) File Added: lch.py3k.unicode.lazy.concat.patch.3.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-09 01:10 Message: Logged In: YES user_id=364875 Originator: YES Revised the lazy concatenation patch to add (doh!) a check for when PyMem_NEW() fails in PyUnicode_AsUnicode(). File Added: lch.py3k.unicode.lazy.concat.patch.2.txt ---------------------------------------------------------------------- Comment By: Larry Hastings (lhastings) Date: 2007-01-08 18:50 Message: Logged In: YES user_id=364875 Originator: YES jcarlson: The first time someone calls PyUnicode_AsUnicode() on a concatenation object, it renders the string, and that's an O(something) operation. In general this rendering is O(i), aka linear time, though linear related to *what* depends. (It iterates over the m concatenated strings, and each of the n characters in those strings, and whether n or m is more important depends on their values.) After rendering, the object behaves like any other Unicode string, including O(1) for array element lookup. If you're referring to GvR's statement "I mention performance because s[i] should remain an O(1) operation.", here: http://mail.python.org/pipermail/python-3000/2006-December/005281.html I suspect this refers to the UCS-2 vs. UTF-16 debate. lemberg: Your criticisms are fair; lazy evaluation is a tradeoff. In general my response to theories about how it will affect performance is "I invite you to try it and see". As for causing memory errors, the only problem I see is not checking for a NULL return from PyMem_NEW() in PyUnicode_AsUnicode(). But that's a bug, not a flaw in my approach, and I'll fix that bug today. I don't see how "[my] approach can cause memory errors" in any sort of larger sense. ---------------------------------------------------------------------- Comment By: M.-A. Lemburg (lemburg) Date: 2007-01-08 10:59 Message: Logged In: YES user_id=38388 Originator: NO While I don't think the added complexity in the implementation is worth it, given that there are other ways of achieving the same kind of performance (e.g. list of Unicode strings), some comments: * you add a long field to every Unicode object - so every single object in the system pays 4-8 bytes for the small performance advantage * Unicode objects are often references using PyUnicode_AS_UNICODE(); this operation doesn't allow passing back errors, yet your lazy evaluation approach can cause memory errors - how are you going to deal with them ? (currently you don't even test for them) * the lazy approach keeps all partial Unicode objects alive until they finally get concatenated; if you have lots of those (e.g. if you use x += y in a loop), then you pay the complete Python object overhead for every single partial Unicode object in the list of strings - given that most such operations use short strings, you are likely creating a memory overhead far greater than the the total length of all the strings ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-07 05:08 Message: Logged In: YES user_id=341410 Originator: NO What are the performance characteristics of each operation? I presume that a + b for unicode strings a and b is O(1) time (if I understand your implementation correctly). But according to my reading, (a + b + c + ...)[i] is O(number of concatenations performed). Is this correct? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1629305&group_id=5470 From noreply at sourceforge.net Sun Jan 21 01:08:08 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 20 Jan 2007 16:08:08 -0800 Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does not close sockets properly) Message-ID: Patches item #1627441, was opened at 2007-01-03 23:46 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for #1601399 (urllib2 does not close sockets properly) Initial Comment: Fix for #1601399 Definitely a backport candidate. ---------------------------------------------------------------------- >Comment By: John J Lee (jjlee) Date: 2007-01-21 00:08 Message: Logged In: YES user_id=261020 Originator: YES Added tests. File Added: urllib2_close_socket_v2.patch ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-20 02:44 Message: Logged In: YES user_id=1591633 Originator: NO Patch looks good to me, and the tests still pass. If it matters, I would like to see a test case presented in the patch as well. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 From noreply at sourceforge.net Sun Jan 21 06:26:25 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 20 Jan 2007 21:26:25 -0800 Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does not close sockets properly) Message-ID: Patches item #1627441, was opened at 2007-01-03 17:46 Message generated for change (Comment added) made by mark-roberts You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for #1601399 (urllib2 does not close sockets properly) Initial Comment: Fix for #1601399 Definitely a backport candidate. ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-20 23:26 Message: Logged In: YES user_id=1591633 Originator: NO I'd say it looks good. Now lets see if we can get someone to apply it for us. Thanks for the adding the tests! ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2007-01-20 18:08 Message: Logged In: YES user_id=261020 Originator: YES Added tests. File Added: urllib2_close_socket_v2.patch ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-19 20:44 Message: Logged In: YES user_id=1591633 Originator: NO Patch looks good to me, and the tests still pass. If it matters, I would like to see a test case presented in the patch as well. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 From noreply at sourceforge.net Sun Jan 21 10:33:31 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 21 Jan 2007 01:33:31 -0800 Subject: [Patches] [ python-Patches-1610575 ] C99 _Bool support for struct Message-ID: Patches item #1610575, was opened at 2006-12-07 06:37 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610575&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.6 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: David Remahl (chmod007) Assigned to: Nobody/Anonymous (nobody) Summary: C99 _Bool support for struct Initial Comment: C99 adds the fundamental _Bool integer type (fundamental in the sense that it is not equivalent to or a composite of any other C type). Its size can vary from platform to platform; the only restriction imposed by the C standard is that it must be able to hold the values 0 or 1. Typically, sizeof _Bool is 1 or 4. A struct module user trying to parse a native C structure that contains a _Bool member faces a problem: struct does not have a format character for _Bool. One is forced to hardcode a size for bool (use a char or an int instead). This patch adds support for a new format character, 't', representing the fundamental type _Bool. It is handled sementically as representing pure booleans -- when packing a structure the truth value of the argument to be packed is used and when unpacking either True or False is always returned. For platforms that don't support _Bool, as well as in non-native mode, 't' packs as a single byte. Test cases are included, as well as a small change to the struct documentation. The patch modifies configure.in to check for _Bool support, and the patch includes the autogenerated configure and pyconfig.h.in files as well. I have tested the module on Mac OS X x86 (uses 1 byte for _Bool) and Mac OS X ppc (uses 4 bytes for _Bool). Ran regression suite. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-21 10:33 Message: Logged In: YES user_id=21627 Originator: NO Thanks for the patch. Committed as r53508 ---------------------------------------------------------------------- Comment By: David Remahl (chmod007) Date: 2006-12-08 08:13 Message: Logged In: YES user_id=2135 Originator: YES Oops! I didn't intend for there to be any ctypes content in this patch (as indicated by the subject), but apparently I forgot to remove part of the ctypes section. I have uploaded a new patch without that part. Once this has been integrated, I'll upload a complete ctypes patch for consideration. File Added: bool struct patch-2.diff ---------------------------------------------------------------------- Comment By: Thomas Heller (theller) Date: 2006-12-07 21:09 Message: Logged In: YES user_id=11105 Originator: NO The patch is not complete or not correct. Either: - the part of that patch that changes Modules/_ctypes/_ctypes.c should be omitted because it does not contain a ctypes _Bool type - or complete support for a ctypes _Bool type (how would that be called? ctypes.c99_bool?) should be added, together with tests in Lib/ctypes/test ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1610575&group_id=5470 From noreply at sourceforge.net Sun Jan 21 11:35:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 21 Jan 2007 02:35:30 -0800 Subject: [Patches] [ python-Patches-1627441 ] Fix for #1601399 (urllib2 does not close sockets properly) Message-ID: Patches item #1627441, was opened at 2007-01-03 23:46 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: John J Lee (jjlee) Assigned to: Nobody/Anonymous (nobody) Summary: Fix for #1601399 (urllib2 does not close sockets properly) Initial Comment: Fix for #1601399 Definitely a backport candidate. ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-21 10:35 Message: Logged In: YES user_id=849994 Originator: NO Committed as rev. 53511, 53512 (2.5). ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-21 05:26 Message: Logged In: YES user_id=1591633 Originator: NO I'd say it looks good. Now lets see if we can get someone to apply it for us. Thanks for the adding the tests! ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2007-01-21 00:08 Message: Logged In: YES user_id=261020 Originator: YES Added tests. File Added: urllib2_close_socket_v2.patch ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-20 02:44 Message: Logged In: YES user_id=1591633 Originator: NO Patch looks good to me, and the tests still pass. If it matters, I would like to see a test case presented in the patch as well. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1627441&group_id=5470 From noreply at sourceforge.net Mon Jan 22 12:52:40 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 03:52:40 -0800 Subject: [Patches] [ python-Patches-1641544 ] rlcompleter tab completion in pdb Message-ID: Patches item #1641544, was opened at 2007-01-22 11:52 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Stephen Emslie (stephenemslie) Assigned to: Nobody/Anonymous (nobody) Summary: rlcompleter tab completion in pdb Initial Comment: By default, Pdb and other instances of Cmd complete names for their commands. However in the context of pdb, I think it is more useful to complete identifiers and keywords in its current scope than to complete names of commands (most of which have single letter abbreviations). I believe this makes pdb a far more usable introspection tool. I have discussed this proposal on the python-ideas list: http://mail.python.org/pipermail/python-ideas/2007-January/000084.html This patch implements the following: - creates an rlcompleter instance on Pdb if readline is available - adds a 'complete' method to the Pdb class. The only difference with rlcompleter's default behaviour is that is also updates rlcompleter's namespace to reflect the current local and global namespace, which is necessary because pdb changes scope as it steps through a program This is a patch against python/Lib/pdb.py rev. 51745 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470 From noreply at sourceforge.net Mon Jan 22 18:00:04 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 09:00:04 -0800 Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers Message-ID: Patches item #1641790, was opened at 2007-01-22 17:00 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: TH (therve) Assigned to: Nobody/Anonymous (nobody) Summary: logging leaks loggers Initial Comment: In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager). Attached a patch using a weakref object, with a test. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 From noreply at sourceforge.net Mon Jan 22 18:09:18 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 09:09:18 -0800 Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers Message-ID: Patches item #1641790, was opened at 2007-01-22 17:00 Message generated for change (Comment added) made by therve You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: TH (therve) Assigned to: Nobody/Anonymous (nobody) Summary: logging leaks loggers Initial Comment: In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager). Attached a patch using a weakref object, with a test. ---------------------------------------------------------------------- >Comment By: TH (therve) Date: 2007-01-22 17:09 Message: Logged In: YES user_id=1038797 Originator: YES Looking at the documentation, it seems keeping it is mandatory because you must get the same instance with getLogger. Maybe it'd need a documented way to remove from the dict, though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 From noreply at sourceforge.net Mon Jan 22 18:33:06 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 09:33:06 -0800 Subject: [Patches] [ python-Patches-1591665 ] adding __dir__ Message-ID: Patches item #1591665, was opened at 2006-11-06 23:52 Message generated for change (Settings changed) made by gangesmaster You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1591665&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 6 Private: No Submitted By: ganges master (gangesmaster) >Assigned to: Barry A. Warsaw (bwarsaw) Summary: adding __dir__ Initial Comment: in accordance with http://mail.python.org/pipermail/python-dev/2006-November/069865.html i've written a patch that allows objects to define their own introspection mechanisms, by providing __dir__. with this patch: * dir() returns the locals. this is done in builtin_dir() * dir(obj) returns the attributes of obj, by invoking PyObject_Dir() * if obj->ob_type has "__dir__", it is used. note that it must return a list! * otherwise, use default the mechanism of collecting attributes * for module objects, return __dict__.keys() * for type objects, return __dict__.keys() + dir(obj.__base__) * for all other objects, return __dict__.keys() + __members__ + __methods__ + dir(obj.__class__) * builtin_dir takes care of sorting the list ---------------------------------------------------------------------- Comment By: ganges master (gangesmaster) Date: 2006-12-19 23:12 Message: Logged In: YES user_id=1406776 Originator: YES i guess the demo isn't updated/relevant anymore. instead, concrete tests were added to lib/tests/test_builtin.py ---------------------------------------------------------------------- Comment By: Armin Rigo (arigo) Date: 2006-11-23 14:11 Message: Logged In: YES user_id=4771 Originator: NO Line 20 in demo.py: assert "__getitem__" in dir(x) looks strange to me... Foo doesn't inherit from any sequence or mapping type. ---------------------------------------------------------------------- Comment By: ganges master (gangesmaster) Date: 2006-11-11 23:31 Message: Logged In: YES user_id=1406776 > PyObject_CallFunctionObjArgs(dirfunc, obj, NULL) done > Couldn't __dir__ also be allowed to return a tuple? no, because tuples are not sortable, and i don't want to over complicate the c-side code of PyObject_Dir. having __dir__ returning only a list is equivalent to __repr__ returning only strings. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-11-11 21:58 Message: Logged In: YES user_id=849994 * Instead of doing PyObject_CallFunction(dirfunc, "O", obj) you should do PyObject_CallFunctionObjArgs(dirfunc, obj, NULL). * Couldn't __dir__ also be allowed to return a tuple? ---------------------------------------------------------------------- Comment By: ganges master (gangesmaster) Date: 2006-11-08 13:22 Message: Logged In: YES user_id=1406776 i like to init all my locals ("just in case"), but if the rest of the code does not adhere my style, i'll change that. anyway, i made the changes to the code, updated the docs, and added full tests (the original dir() wasn't test so thoroughly) -tomer ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-11-08 07:53 Message: Logged In: YES user_id=33168 tomer, do you know about configuring with --pydebug? That helps track down refleaks when running regrtest -R ::. object.c: _dir_locals: result is not necessary and locals doesn't need to be initialized as it's set on the next line. You could just declare and set it all on one line. _specialized_dir_type should be static. No need to init dict. Either don't init result or remove else result = NULL. I'd prefer removing the else and leaving the init. _specialized_dir_module should be static. No need to init dict. Can you get the name of the module and use that in the error msg: PyModule_GetName()? That would hopefully provide a nicer error msg. _generic_dir: No need to init dict. + /* XXX api change: how about falling to obj->ob_type + XXX if no __class__ exists? */ Do you mean falling *back*? Also, we've been using XXX(username): as the format for such comments. So this would be better as: /* XXX(tomer): Perhaps fall back to obj->ob_type if no __class__ exists? */ _dir_object: No need to init dirfunc. PyObject_Dir: No need to init result. Are there tests for all conditions? At least: * dir() * dir(obj) * dir(obj_with_no_dict) * dir(obj_with_no__class__) * dir(obj_with__methods__) * dir(obj_with__members__) * dir(module) * dir(module_with_no__dict__) * dir(module_with_invalid__dict__) There also need to be updates to Doc/lib/libfuncs.tex. If you can't deal with the markup, just do the best you can in text and someone else will fixup the markup. Thanks for attaching the patch as a single file, it's easier to deal with. ---------------------------------------------------------------------- Comment By: ganges master (gangesmaster) Date: 2006-11-07 17:37 Message: Logged In: YES user_id=1406776 okay: * builtin_dir directly calls PyObject_Dir * PyObject_Dir handles NULL argument and sorting * it is now completely compatible with the 2.5 API * fixed several refcount bugs (i wish we had a tracing gc :) ---------------------------------------------------------------------- Comment By: Nick Coghlan (ncoghlan) Date: 2006-11-07 00:52 Message: Logged In: YES user_id=1038590 The retrieval of locals on a NULL argument and the sorting step need to move back inside PyObject_Dir to avoid changing the C API. If the standard library's current C API tests didn't break on this version of the patch, then the final version of the patch should include enhanced tests for PyObject_Dir that pass both before and after the patch is applied to PyObject_Dir. Other than that, I didn't see any major problems on reading the code (i.e. refcounts and error handling looked pretty reasonable). I haven't actually run it though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1591665&group_id=5470 From noreply at sourceforge.net Mon Jan 22 20:08:05 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 11:08:05 -0800 Subject: [Patches] [ python-Patches-1587674 ] Patch for #1586414 to avoid fragmentation on Windows Message-ID: Patches item #1587674, was opened at 2006-10-31 06:05 Message generated for change (Comment added) made by gustaebel You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587674&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 >Status: Closed >Resolution: Rejected Priority: 5 Private: No Submitted By: Enoch Julias (enochjul) Assigned to: Lars Gust?bel (gustaebel) Summary: Patch for #1586414 to avoid fragmentation on Windows Initial Comment: Add a call to file.truncate() to inform Windows of the size of the target file in makefile(). This helps guide cluster allocation in NTFS to avoid fragmentation. ---------------------------------------------------------------------- >Comment By: Lars Gust?bel (gustaebel) Date: 2007-01-22 20:08 Message: Logged In: YES user_id=642936 Originator: NO Closed due to lack of interest. ---------------------------------------------------------------------- Comment By: Lars Gust?bel (gustaebel) Date: 2006-12-23 20:03 Message: Logged In: YES user_id=642936 Originator: NO Any progress on this one? ---------------------------------------------------------------------- Comment By: Lars Gust?bel (gustaebel) Date: 2006-11-08 22:30 Message: Logged In: YES user_id=642936 You both still fail to convince me and I still don't see need for action. The only case ATM where this addition makes sense (in your opinion) is the Windows OS when using the NTFS filesystem and certain conditions are met. NTFS has a preallocation algorithm to deal with this. We don't know if there is any advantage on FAT filesystems. On Linux for example there is a plethora of supported filesystems. Some of them may take advantage, others may not. Who knows? We can't even detect which filesystem type we are currently writing to. Apart from that, the behaviour of truncate(arg) with arg > filesize seems to be system-dependent. So, IMO this is a very special optimization targeted at a single platform. The TarFile class is easily subclassable, just override the makefile() method and add the two lines of code. I think that's what ActiveState's Python Cookbook is for. BTW, I like my files to grow bit by bit. In case of an error, I can detect if a file was not extracted completely by comparing the file sizes. Furthermore, a file that grows is more common and more what a programmer who uses this module might expect. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2006-11-08 17:33 Message: Logged In: YES user_id=341410 I disagree with user gustaebel. We should be adding automatic truncate calls for all possible supported platforms, in all places where it could make sense. Be it in tarfile, zipfile, where ever we can. It would make sense to write a function that can be called by all of those modules so that there is only one place to update if/when changes occur. If the function were not part of the public Python API, then it wouldn't need to wait until 2.6, unless it were considered a feature addition rather than bugfix. One would have to wait on a response from Martin or Anthony to know which it was, though I couldn't say for sure if operations that are generally performance enhancing are bugfixes or feature additions. ---------------------------------------------------------------------- Comment By: Lars Gust?bel (gustaebel) Date: 2006-11-06 22:57 Message: Logged In: YES user_id=642936 Personally, I think disk defragmenters are evil ;-) They create the need that they are supposed to satisfy at the same time. On Linux we have no defragmenters, so we don't bother about it. I think your proposal is some kind of a performance hack for a particular filesystem. In principle, this problem exists for all filesystems on all platforms. Fragmentation is IMO a filesystem's problem and is not so much a state but more like a process. Filesystem fragment over time and you can't do anything about it. For those people who care, disk fragmenter were invented. It is not tarfile.py's job to care about a fragmented filesystem, that's simply too low level. I admit that it is a small patch, but I'm -1 on having this applied. ---------------------------------------------------------------------- Comment By: Enoch Julias (enochjul) Date: 2006-11-06 18:19 Message: Logged In: YES user_id=6071 I have not really tested FAT/FAT32 yet as I don't use these filesystems now. The Disk Defragmenter tool in Windows 2000/XP shows the number of files/directories fragmented in its report. NTFS does handle growing files, but the operating system can only do so much without knowing the size of the file. Extracting from archives consisting of only several files does not cause fragmentation. However, if the archive has many files, it is much more likely that the default algorithm will fail to allocate contiguous clusters for some files. It may also depend on the amount of free space fragmentation on a particular partition and whether other processes are writing to other files in the same partition. Some details of the cluster allocation algorithm used in Windows can be found at http://support.microsoft.com/kb/841551. ---------------------------------------------------------------------- Comment By: Lars Gust?bel (gustaebel) Date: 2006-11-01 16:27 Message: Logged In: YES user_id=642936 Is this merely an NTFS problem or is it the same with FAT fs? How do you detect file fragmentation? Doesn't this problem apply to all other modules or scripts that write to file objects as well? Shouldn't a decent filesystem be able to handle growing files in a correct manner? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1587674&group_id=5470 From noreply at sourceforge.net Mon Jan 22 20:40:44 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 11:40:44 -0800 Subject: [Patches] [ python-Patches-1637157 ] urllib: change email.Utils -> email.utils Message-ID: Patches item #1637157, was opened at 2007-01-16 22:08 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Russell Owen (reowen) Assigned to: Nobody/Anonymous (nobody) Summary: urllib: change email.Utils -> email.utils Initial Comment: urllib uses the old name email.Utils instead of the new name email.Utils. This confuses py2app and possibly other packagers. Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right). ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-22 19:40 Message: Logged In: YES user_id=849994 Originator: NO Fixed in trunk. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637157&group_id=5470 From noreply at sourceforge.net Mon Jan 22 20:41:02 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 11:41:02 -0800 Subject: [Patches] [ python-Patches-1637159 ] urllib2: email.Utils->email.utils Message-ID: Patches item #1637159, was opened at 2007-01-16 22:09 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Russell Owen (reowen) Assigned to: Nobody/Anonymous (nobody) Summary: urllib2: email.Utils->email.utils Initial Comment: urllib2 uses the old name email.Utils instead of the new name email.Utils. This may confuse py2app and/or other packagers. Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right). ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-22 19:41 Message: Logged In: YES user_id=849994 Originator: NO Fixed in trunk. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637159&group_id=5470 From noreply at sourceforge.net Mon Jan 22 20:41:20 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 11:41:20 -0800 Subject: [Patches] [ python-Patches-1637162 ] smtplib email renames Message-ID: Patches item #1637162, was opened at 2007-01-16 22:11 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Russell Owen (reowen) Assigned to: Nobody/Anonymous (nobody) Summary: smtplib email renames Initial Comment: smtplib uses the old names email.Utils and email.base64MIME instead of the new email.utils and email.base64mime. This may confuse py2app and/or other packagers. Note: this diff is against python/trunk/Lib/ rev 53110 (I'm not sure if I set the Group right). ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-22 19:41 Message: Logged In: YES user_id=849994 Originator: NO Fixed in trunk. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1637162&group_id=5470 From noreply at sourceforge.net Mon Jan 22 20:44:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 11:44:30 -0800 Subject: [Patches] [ python-Patches-1635639 ] ConfigParser does not quote % Message-ID: Patches item #1635639, was opened at 2007-01-15 02:43 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635639&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: None >Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mark Roberts (mark-roberts) Assigned to: Nobody/Anonymous (nobody) Summary: ConfigParser does not quote % Initial Comment: This is covered by bug 1603688 (https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1603688&group_id=5470) I implemented 2 versions of this patch. One version raises ValueError when an invalid interpolation syntax is encountered (such as foo%, foo%bar, and %foo, but not %%foo and %(dir)foo). The other version simply replaces appropriate %s with %%s. Initially, I believed ValueError was the appropriate way to go with this. However, when I thought about how I use ConfigParser, I realized that it would be far nicer if it simply worked. I'm +0.5 to ValueError, and +1 to munging the values. ---------------------------------------------------------------------- >Comment By: A.M. Kuchling (akuchling) Date: 2007-01-22 14:44 Message: Logged In: YES user_id=11375 Originator: NO Turning into a patch. ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-15 21:17 Message: Logged In: YES user_id=1591633 Originator: YES For the record, this was supposed to be a patch. I don't know if the admins have any way of moving it to that category. I guess that explained the funky categories and groups. Sorry for the inconvenience. ---------------------------------------------------------------------- Comment By: Mark Roberts (mark-roberts) Date: 2007-01-15 02:44 Message: Logged In: YES user_id=1591633 Originator: YES File Added: bug_1603688_cfgparser_munges.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1635639&group_id=5470 From noreply at sourceforge.net Tue Jan 23 05:47:12 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 22 Jan 2007 20:47:12 -0800 Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers Message-ID: Patches item #1641790, was opened at 2007-01-22 09:00 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: TH (therve) >Assigned to: Vinay Sajip (vsajip) Summary: logging leaks loggers Initial Comment: In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager). Attached a patch using a weakref object, with a test. ---------------------------------------------------------------------- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-22 20:47 Message: Logged In: YES user_id=33168 Originator: NO Vinay, can you provide some direction? Thanks. ---------------------------------------------------------------------- Comment By: TH (therve) Date: 2007-01-22 09:09 Message: Logged In: YES user_id=1038797 Originator: YES Looking at the documentation, it seems keeping it is mandatory because you must get the same instance with getLogger. Maybe it'd need a documented way to remove from the dict, though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 From snash at smartsource-inc.com Mon Jan 22 23:52:42 2007 From: snash at smartsource-inc.com (Scott Nash) Date: Mon, 22 Jan 2007 16:52:42 -0600 Subject: [Patches] =?windows-1252?q?Can_you_point_me_in_the_right_directio?= =?windows-1252?q?n_for_/_IT_Staff_Augmentation_contract=2C_contrac?= =?windows-1252?q?t_to_hire=2C_perm_placement_or_project_basis=2E?= Message-ID: An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/patches/attachments/20070122/04b9a8d4/attachment.htm From noreply at sourceforge.net Tue Jan 23 09:42:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 00:42:30 -0800 Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers Message-ID: Patches item #1641790, was opened at 2007-01-22 17:00 Message generated for change (Comment added) made by vsajip You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 >Status: Closed >Resolution: Invalid Priority: 5 Private: No Submitted By: TH (therve) Assigned to: Vinay Sajip (vsajip) Summary: logging leaks loggers Initial Comment: In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager). Attached a patch using a weakref object, with a test. ---------------------------------------------------------------------- >Comment By: Vinay Sajip (vsajip) Date: 2007-01-23 08:42 Message: Logged In: YES user_id=308438 Originator: NO This is not a leak - it's by design. You are not using best practice when you create a logger per client; the specific scenario of getting connection info in the logging message can currently be done in several ways, e.g. 1. Use the 'extra' parameter (added in Python 2.5). 2. Use a connection-specific factory to obtain the logging message, or wrap the logging call on a connection-specific object which inserts the connection info. 3. Use something other than a literal string for the message - as documented, any object can be used as the message, and the logging system calls str() on it to get the actual text of the message. The "something" can be an instance of a class which Does The Right Thing. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-23 04:47 Message: Logged In: YES user_id=33168 Originator: NO Vinay, can you provide some direction? Thanks. ---------------------------------------------------------------------- Comment By: TH (therve) Date: 2007-01-22 17:09 Message: Logged In: YES user_id=1038797 Originator: YES Looking at the documentation, it seems keeping it is mandatory because you must get the same instance with getLogger. Maybe it'd need a documented way to remove from the dict, though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 From noreply at sourceforge.net Tue Jan 23 09:54:54 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 00:54:54 -0800 Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers Message-ID: Patches item #1641790, was opened at 2007-01-22 17:00 Message generated for change (Comment added) made by therve You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Closed Resolution: Invalid Priority: 5 Private: No Submitted By: TH (therve) Assigned to: Vinay Sajip (vsajip) Summary: logging leaks loggers Initial Comment: In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager). Attached a patch using a weakref object, with a test. ---------------------------------------------------------------------- >Comment By: TH (therve) Date: 2007-01-23 08:54 Message: Logged In: YES user_id=1038797 Originator: YES OK I understand the design. But it's not clear in the documentation that once you've called getLogger('id') the logger will live forever. It's especially problematic on long-running processes. It would be great to have at least a warning in the documentation about this feature. ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-23 08:42 Message: Logged In: YES user_id=308438 Originator: NO This is not a leak - it's by design. You are not using best practice when you create a logger per client; the specific scenario of getting connection info in the logging message can currently be done in several ways, e.g. 1. Use the 'extra' parameter (added in Python 2.5). 2. Use a connection-specific factory to obtain the logging message, or wrap the logging call on a connection-specific object which inserts the connection info. 3. Use something other than a literal string for the message - as documented, any object can be used as the message, and the logging system calls str() on it to get the actual text of the message. The "something" can be an instance of a class which Does The Right Thing. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-23 04:47 Message: Logged In: YES user_id=33168 Originator: NO Vinay, can you provide some direction? Thanks. ---------------------------------------------------------------------- Comment By: TH (therve) Date: 2007-01-22 17:09 Message: Logged In: YES user_id=1038797 Originator: YES Looking at the documentation, it seems keeping it is mandatory because you must get the same instance with getLogger. Maybe it'd need a documented way to remove from the dict, though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 From noreply at sourceforge.net Tue Jan 23 12:18:39 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 03:18:39 -0800 Subject: [Patches] [ python-Patches-1507247 ] tarfile extraction does not honor umask Message-ID: Patches item #1507247, was opened at 2006-06-16 14:11 Message generated for change (Comment added) made by gustaebel You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1507247&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Faik Uygur (faik) Assigned to: Lars Gust?bel (gustaebel) Summary: tarfile extraction does not honor umask Initial Comment: If the upperdirs in the member file's pathname does not exist. tarfile creates those paths with 0777 permission bits and does not honor umask. This patch uses umask to set the ti.mode of the created directory for later usage in chmod. --- tarfile.py (revision 46993) +++ tarfile.py (working copy) @@ -1560,7 +1560,9 @@ ti = TarInfo() ti.name = upperdirs ti.type = DIRTYPE - ti.mode = 0777 + umask = os.umask(0) + ti.mode = 0777 - umask + os.umask(umask) ti.mtime = tarinfo.mtime ti.uid = tarinfo.uid ti.gid = tarinfo.gid ---------------------------------------------------------------------- >Comment By: Lars Gust?bel (gustaebel) Date: 2007-01-23 12:18 Message: Logged In: YES user_id=642936 Originator: NO Committed my patch as rev. 53526. ---------------------------------------------------------------------- Comment By: Lars Gust?bel (gustaebel) Date: 2006-12-31 12:52 Message: Logged In: YES user_id=642936 Originator: NO I've come to the conclusion that it is a doubtful approach to take the mtime and ownership from the file and use it on the upper directories as well. So, I've come up with a totally different solution (cp. makedirs.diff) that abandons the use of os.umask() completely and uses a single call to os.makedirs() to create the missing directories. It seems very attractive to me to do it this way, what do you think? File Added: makedirs.diff ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-30 19:25 Message: Logged In: YES user_id=161998 Originator: NO umask(2) works in the same way, so there seems to be no unixy way to inspect umask without setting it. I think the solution would be to make a C-level function to return the umask (by setting and resetting it). As the interpreter itself is single threaded, this is race-free. ---------------------------------------------------------------------- Comment By: Lars Gust?bel (gustaebel) Date: 2006-12-30 13:11 Message: Logged In: YES user_id=642936 Originator: NO In order to determine the current umask we have no other choice AFAIK than to set it with a bogus value, save the return value and restore it right away - as you proposed in your patch. The problem is that there is a small window of time between these two calls where the umask is invalid. This is especially bad in multi-threaded environments. Any ideas? ---------------------------------------------------------------------- Comment By: Han-Wen Nienhuys (hanwen) Date: 2006-12-06 00:40 Message: Logged In: YES user_id=161998 Originator: NO Hi, I can reproduce this problem on python 2.4 , and patch applies to python 2.5 too. Fix looks good to me. ---------------------------------------------------------------------- Comment By: Faik Uygur (faik) Date: 2006-08-18 11:44 Message: Logged In: YES user_id=1541018 Above patch is wrong. The correct one is attached. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1507247&group_id=5470 From noreply at sourceforge.net Tue Jan 23 15:02:33 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 06:02:33 -0800 Subject: [Patches] [ python-Patches-1642547 ] Fix error/crash in AST: syntaxerror in complex ifs Message-ID: Patches item #1642547, was opened at 2007-01-23 15:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: None Status: Open Resolution: None Priority: 9 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Neal Norwitz (nnorwitz) Summary: Fix error/crash in AST: syntaxerror in complex ifs Initial Comment: Fix a bug in Python/ast.c, where a particular syntaxerror in an 'if' with one or more 'elif's would be ignored or mishandled: timberwolf:~/python/python/trunk > cat test2.py def bug(): if w: dir()=1 elif v: pass timberwolf:~/python/python/trunk > python2.4 test2.py File "test2.py", line 3 dir()=1 SyntaxError: can't assign to function call timberwolf:~/python/python/trunk > python2.5 test2.py Exception exceptions.SyntaxError: ("can't assign to function call", 3) in 'garbage collection' ignored Fatal Python error: unexpected exception during garbage collection Aborted The actual problem is the lack of error checks on the return values of ast_for_expr() and ast_for_suite, in ast_for_if_stmt. Attached patch fixes. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642547&group_id=5470 From noreply at sourceforge.net Tue Jan 23 15:02:53 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 06:02:53 -0800 Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing sys.stdout in sitecustomize Message-ID: Patches item #1630975, was opened at 2007-01-08 23:55 Message generated for change (Settings changed) made by twouters You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: None >Status: Closed >Resolution: Fixed Priority: 9 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Thomas Wouters (twouters) Summary: Fix crash when replacing sys.stdout in sitecustomize Initial Comment: When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects. Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes. No idea how to test for this, from a unittest :P ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 07:56 Message: Logged In: YES user_id=33168 Originator: NO Forgot to mention that I agree about the warning. If no one noticed so far, this is such an obscure case, it's not that important to warn. Either way is fine with me. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 07:55 Message: Logged In: YES user_id=33168 Originator: NO I can think of a nasty way to test this, but it's not really worth it. You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH and spawning a python. Ok, so it's not a real unit test, but it is a test. :-) This looks like it will also crash (before and after the patch) if sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). sysmodule.c looks fine. I think this is fine for 2.5.1. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 From noreply at sourceforge.net Tue Jan 23 15:04:27 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 06:04:27 -0800 Subject: [Patches] [ python-Patches-1630975 ] Fix crash when replacing sys.stdout in sitecustomize Message-ID: Patches item #1630975, was opened at 2007-01-08 23:55 Message generated for change (Comment added) made by twouters You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: None Status: Closed Resolution: Fixed Priority: 9 Private: No Submitted By: Thomas Wouters (twouters) Assigned to: Thomas Wouters (twouters) Summary: Fix crash when replacing sys.stdout in sitecustomize Initial Comment: When replacing sys.stdout, stderr and/or stdin with non-file, file-like objects in sitecustomize, and also having an environment that makes Python set the encoding of those streams, Python will crash. PyFile_SetEncoding() will be called after sys.stdout/stderr/stdin are replaced, passing the non-file objects. Fix by not calling PyFile_SetEncoding() in these cases. I'm not entirely sure if we should warn or not; not setting encoding only for replaced streams may cause a disconnect between stdout and stderr that's hard to explain, when someone only replaces one of them (in sitecustomize.) Then again, not many people must be doing it, as it currently just crashes. No idea how to test for this, from a unittest :P ---------------------------------------------------------------------- >Comment By: Thomas Wouters (twouters) Date: 2007-01-23 15:04 Message: Logged In: YES user_id=34209 Originator: YES Oh, for the record: I was unable to produce a crash by *deleting* sys.stdin/stdout/stderr (although it produced funny results. In particular when I adding a 'print' statement after the deletes, in my sitecustomize.py, to make sure it was getting run. Of course, the print never arrived ;-P) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 07:56 Message: Logged In: YES user_id=33168 Originator: NO Forgot to mention that I agree about the warning. If no one noticed so far, this is such an obscure case, it's not that important to warn. Either way is fine with me. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 07:55 Message: Logged In: YES user_id=33168 Originator: NO I can think of a nasty way to test this, but it's not really worth it. You'd need to 'install' your own sitecustomize.py by setting PYTHONPATH and spawning a python. Ok, so it's not a real unit test, but it is a test. :-) This looks like it will also crash (before and after the patch) if sys.std{in,out,err} are just deleted rather than replaced (pythonrun.c). sysmodule.c looks fine. I think this is fine for 2.5.1. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1630975&group_id=5470 From noreply at sourceforge.net Tue Jan 23 18:06:59 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 09:06:59 -0800 Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers Message-ID: Patches item #1641790, was opened at 2007-01-22 18:00 Message generated for change (Comment added) made by pitrou You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Closed Resolution: Invalid Priority: 5 Private: No Submitted By: TH (therve) Assigned to: Vinay Sajip (vsajip) Summary: logging leaks loggers Initial Comment: In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager). Attached a patch using a weakref object, with a test. ---------------------------------------------------------------------- Comment By: Antoine Pitrou (pitrou) Date: 2007-01-23 18:06 Message: Logged In: YES user_id=133955 Originator: NO Ok, since I was the one bitten by this bug I might as well add my 2 cents to the discussion. vsajip: > 1. Use the 'extra' parameter (added in Python 2.5). This is not practical. I want to define a prefix once and for all for all log messages that will be output in a given context. Explicitly adding a parameter to every log call does not help. (of course I can write wrappers to do this automatically - and that's what I ended up doing -, but then I must write 6 of them: one for each of "debug", "info", "warning", "error", "critical", and "exception"...) > 2. Use a connection-specific factory to obtain the logging message, or wrap the logging call on a connection-specific object which inserts the connection info. I don't even know what this means, but it sounds way overkill... > 3. Use something other than a literal string for the message - as documented, any object can be used as the message, and the logging system calls str() on it to get the actual text of the message. The "something" can be an instance of a class which Does The Right Thing. IIUC this means some explicitly machinery on each logging call, since I have to wrap every string in a constructor. Just like the "extra" parameter, with a slightly different flavour. It's disturbing that the logging module has so many powerful options but no way of conveniently doing simple things without creating memory leaks... ---------------------------------------------------------------------- Comment By: TH (therve) Date: 2007-01-23 09:54 Message: Logged In: YES user_id=1038797 Originator: YES OK I understand the design. But it's not clear in the documentation that once you've called getLogger('id') the logger will live forever. It's especially problematic on long-running processes. It would be great to have at least a warning in the documentation about this feature. ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-23 09:42 Message: Logged In: YES user_id=308438 Originator: NO This is not a leak - it's by design. You are not using best practice when you create a logger per client; the specific scenario of getting connection info in the logging message can currently be done in several ways, e.g. 1. Use the 'extra' parameter (added in Python 2.5). 2. Use a connection-specific factory to obtain the logging message, or wrap the logging call on a connection-specific object which inserts the connection info. 3. Use something other than a literal string for the message - as documented, any object can be used as the message, and the logging system calls str() on it to get the actual text of the message. The "something" can be an instance of a class which Does The Right Thing. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-23 05:47 Message: Logged In: YES user_id=33168 Originator: NO Vinay, can you provide some direction? Thanks. ---------------------------------------------------------------------- Comment By: TH (therve) Date: 2007-01-22 18:09 Message: Logged In: YES user_id=1038797 Originator: YES Looking at the documentation, it seems keeping it is mandatory because you must get the same instance with getLogger. Maybe it'd need a documented way to remove from the dict, though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 From noreply at sourceforge.net Tue Jan 23 20:34:28 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Tue, 23 Jan 2007 11:34:28 -0800 Subject: [Patches] [ python-Patches-1642844 ] comments to clarify complexobject.c Message-ID: Patches item #1642844, was opened at 2007-01-23 14:34 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) Assigned to: Nobody/Anonymous (nobody) Summary: comments to clarify complexobject.c Initial Comment: The constructor for a complex takes two values, representing the real and imaginary parts. Obviously, these should normally both be real numbers, but they don't have to be. The code to cater to complex arguments led to even Tim Peters asking WTF? http://mail.python.org/pipermail/python-dev/2007-January/070732.html This patch just adds comments. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470 From noreply at sourceforge.net Wed Jan 24 16:50:29 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 24 Jan 2007 07:50:29 -0800 Subject: [Patches] [ python-Patches-1643641 ] Fix Bug 1362475 Text.edit_modified() doesn't work Message-ID: Patches item #1643641, was opened at 2007-01-24 15:50 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643641&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tkinter Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Matthias Kievernagel (mkiever) Assigned to: Martin v. L?wis (loewis) Summary: Fix Bug 1362475 Text.edit_modified() doesn't work Initial Comment: Text.edit_modified() called _getints() for boolean return values causing an exception. The patch below removes _getints call. The other Text.edit_*() functions have no return values so they still work after applying the patch. Greetings, Matthias Kievernagel ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643641&group_id=5470 From noreply at sourceforge.net Wed Jan 24 22:12:11 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Wed, 24 Jan 2007 13:12:11 -0800 Subject: [Patches] [ python-Patches-1643874 ] ctypes leaks memory Message-ID: Patches item #1643874, was opened at 2007-01-24 22:12 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Thomas Heller (theller) Assigned to: Thomas Heller (theller) Summary: ctypes leaks memory Initial Comment: This program leaks memory, because a string is allocated with the win32 call SysAllocString(), but SysFreeString() is never called. """ from ctypes import oledll, _SimpleCData class BSTR(_SimpleCData): _type_ = "X" func = oledll.oleaut32.SysStringLen func.argtypes = (BSTR,) while 1: func("abcdefghijk") """ The attached patch fixes this. (The BSTR data type is not exposed by ctypes or ctypes.wintypes, because it is only used in connection with Windows COM objects.) The patch should be applied to release25-maint and trunk. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470 From noreply at sourceforge.net Thu Jan 25 10:00:35 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 01:00:35 -0800 Subject: [Patches] [ python-Patches-1638879 ] Fix to the long("123\0", 10) problem Message-ID: Patches item #1638879, was opened at 2007-01-18 20:03 Message generated for change (Comment added) made by lhorn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Calvin Spealman (ironfroggy) Assigned to: Nobody/Anonymous (nobody) Summary: Fix to the long("123\0", 10) problem Initial Comment: This is a simple patch adapted from the int_new function to the long_new function. ---------------------------------------------------------------------- Comment By: Lutz Horn (lhorn) Date: 2007-01-25 10:00 Message: Logged In: YES user_id=96760 Originator: NO This patch compiles and passes all tests against revisions 53406 and 53549 on Ubuntu 6.06.1. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638879&group_id=5470 From noreply at sourceforge.net Thu Jan 25 10:38:26 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 01:38:26 -0800 Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib Message-ID: Patches item #1644218, was opened at 2007-01-25 10:38 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Daniel Nogradi (nogradi) Assigned to: Nobody/Anonymous (nobody) Summary: file -> open in stdlib Initial Comment: AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 From noreply at sourceforge.net Thu Jan 25 18:57:20 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 09:57:20 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 15:13 Message generated for change (Comment added) made by gustavo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 17:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 22:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 16:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 03:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 14:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 00:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Thu Jan 25 19:11:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 10:11:30 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 15:13 Message generated for change (Comment added) made by gustavo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 18:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 17:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 22:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 16:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 03:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 14:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 00:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Thu Jan 25 19:14:37 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 10:14:37 -0800 Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib Message-ID: Patches item #1644218, was opened at 2007-01-25 09:38 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Daniel Nogradi (nogradi) Assigned to: Nobody/Anonymous (nobody) Summary: file -> open in stdlib Initial Comment: AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed. ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-25 18:14 Message: Logged In: YES user_id=849994 Originator: NO I think we should do this at least in the Py3k branch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 From noreply at sourceforge.net Thu Jan 25 19:38:40 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 10:38:40 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 08:13 Message generated for change (Comment added) made by rhamphoryncus You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 11:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 11:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 10:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 15:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 09:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 20:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 08:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 07:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-26 17:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Thu Jan 25 20:22:19 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 11:22:19 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 10:13 Message generated for change (Comment added) made by kuran You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 14:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 13:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 13:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 12:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 17:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 11:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 22:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 10:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 09:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-26 19:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Thu Jan 25 20:24:41 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 11:24:41 -0800 Subject: [Patches] [ python-Patches-1643874 ] ctypes leaks memory Message-ID: Patches item #1643874, was opened at 2007-01-24 22:12 Message generated for change (Comment added) made by theller You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Thomas Heller (theller) Assigned to: Thomas Heller (theller) Summary: ctypes leaks memory Initial Comment: This program leaks memory, because a string is allocated with the win32 call SysAllocString(), but SysFreeString() is never called. """ from ctypes import oledll, _SimpleCData class BSTR(_SimpleCData): _type_ = "X" func = oledll.oleaut32.SysStringLen func.argtypes = (BSTR,) while 1: func("abcdefghijk") """ The attached patch fixes this. (The BSTR data type is not exposed by ctypes or ctypes.wintypes, because it is only used in connection with Windows COM objects.) The patch should be applied to release25-maint and trunk. ---------------------------------------------------------------------- >Comment By: Thomas Heller (theller) Date: 2007-01-25 20:24 Message: Logged In: YES user_id=11105 Originator: YES Fixed in r53556, r53557 (trunk) and r53558 (release25-maint). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1643874&group_id=5470 From noreply at sourceforge.net Thu Jan 25 23:12:30 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 14:12:30 -0800 Subject: [Patches] [ python-Patches-1644818 ] Allow importing built-in submodules Message-ID: Patches item #1644818, was opened at 2007-01-25 22:12 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644818&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Miguel Lobo (mlobo) Assigned to: Nobody/Anonymous (nobody) Summary: Allow importing built-in submodules Initial Comment: At the moment importing built-in submodules (in my case PyQt4.QtCore and PyQt4.QtGui) does not work. This seems to be because find_module in import.c checks only the module name (e.g. QtCore) against the built-in list, which should contain the full name (e.g. Python.QtCore) instead. Also, the above check is performed after the code to check if the parent module is frozen, which would have already exited in that case. By moving the is_builtin() check to earlier in find_module and using fullname instead of name, I can build PyQt4.QtCore and PyQt4.QtGui into the interpreter and import and use them with no problem whatsoever, even if their parent module (PyQt4) is frozen. I have run the regression tests and everything seems Ok. I am completely new to CPython development so it is quite possible that my solution is undesirable or that I have done something incorrectly. Please let me know if that is the case. Finally, the attached patch is for Python-2.5, but I have checked it also applies to current svn trunk with only a one-line offset. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644818&group_id=5470 From noreply at sourceforge.net Thu Jan 25 23:23:02 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Thu, 25 Jan 2007 14:23:02 -0800 Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib Message-ID: Patches item #1644218, was opened at 2007-01-25 10:38 Message generated for change (Comment added) made by nogradi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Daniel Nogradi (nogradi) Assigned to: Nobody/Anonymous (nobody) Summary: file -> open in stdlib Initial Comment: AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed. ---------------------------------------------------------------------- >Comment By: Daniel Nogradi (nogradi) Date: 2007-01-25 23:23 Message: Logged In: YES user_id=1438337 Originator: YES Sounds good :) ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2007-01-25 19:14 Message: Logged In: YES user_id=849994 Originator: NO I think we should do this at least in the Py3k branch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 From noreply at sourceforge.net Fri Jan 26 19:58:25 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Fri, 26 Jan 2007 10:58:25 -0800 Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib Message-ID: Patches item #1644218, was opened at 2007-01-25 10:38 Message generated for change (Comment added) made by nogradi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Daniel Nogradi (nogradi) Assigned to: Nobody/Anonymous (nobody) Summary: file -> open in stdlib Initial Comment: AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed. ---------------------------------------------------------------------- >Comment By: Daniel Nogradi (nogradi) Date: 2007-01-26 19:58 Message: Logged In: YES user_id=1438337 Originator: YES I've just checked and in the py3k branch this has already been done. ---------------------------------------------------------------------- Comment By: Daniel Nogradi (nogradi) Date: 2007-01-25 23:23 Message: Logged In: YES user_id=1438337 Originator: YES Sounds good :) ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2007-01-25 19:14 Message: Logged In: YES user_id=849994 Originator: NO I think we should do this at least in the Py3k branch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 From noreply at sourceforge.net Sat Jan 27 09:48:33 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 27 Jan 2007 00:48:33 -0800 Subject: [Patches] [ python-Patches-1642844 ] comments to clarify complexobject.c Message-ID: Patches item #1642844, was opened at 2007-01-23 19:34 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jim Jewett (jimjjewett) >Assigned to: Tim Peters (tim_one) Summary: comments to clarify complexobject.c Initial Comment: The constructor for a complex takes two values, representing the real and imaginary parts. Obviously, these should normally both be real numbers, but they don't have to be. The code to cater to complex arguments led to even Tim Peters asking WTF? http://mail.python.org/pipermail/python-dev/2007-January/070732.html This patch just adds comments. ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-27 08:48 Message: Logged In: YES user_id=849994 Originator: NO Let Tim decide whether these are useful. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1642844&group_id=5470 From noreply at sourceforge.net Sat Jan 27 09:52:29 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 27 Jan 2007 00:52:29 -0800 Subject: [Patches] [ python-Patches-1644218 ] file -> open in stdlib Message-ID: Patches item #1644218, was opened at 2007-01-25 09:38 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None >Status: Closed >Resolution: Postponed Priority: 5 Private: No Submitted By: Daniel Nogradi (nogradi) Assigned to: Nobody/Anonymous (nobody) Summary: file -> open in stdlib Initial Comment: AFAIK using file( ) to open a file is deprecated in favor of open( )and while grepping through the stdlib I noticed a couple of occurences of file( ) in the latest revision. This patch changes these calls to open( ); all tests are passed. ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-27 08:52 Message: Logged In: YES user_id=849994 Originator: NO Then, given that it's only four occurences, I think we needn't bother with the 2.x line. ---------------------------------------------------------------------- Comment By: Daniel Nogradi (nogradi) Date: 2007-01-26 18:58 Message: Logged In: YES user_id=1438337 Originator: YES I've just checked and in the py3k branch this has already been done. ---------------------------------------------------------------------- Comment By: Daniel Nogradi (nogradi) Date: 2007-01-25 22:23 Message: Logged In: YES user_id=1438337 Originator: YES Sounds good :) ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2007-01-25 18:14 Message: Logged In: YES user_id=849994 Originator: NO I think we should do this at least in the Py3k branch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1644218&group_id=5470 From noreply at sourceforge.net Sat Jan 27 18:43:16 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 27 Jan 2007 09:43:16 -0800 Subject: [Patches] [ python-Patches-1638243 ] compiler.pycodegen causes crashes when compiling 'with' Message-ID: Patches item #1638243, was opened at 2007-01-18 03:52 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Parser/Compiler Group: Python 2.5 >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: kirat (kirat) >Assigned to: Georg Brandl (gbrandl) Summary: compiler.pycodegen causes crashes when compiling 'with' Initial Comment: The compiler package in the python library is missing a LOAD/DELETE just before the WITH_CLEANUP instruction. Also transformer isn't creating the with_var as an assignment. So the following little code snippet will crash if you compile and run it with compiler.compile() class TrivialContext: def __enter__(self): return self def __exit__(self,*exc_info): pass def f(): with TrivialContext() as tc: return 1 f() The fix is just a few lines. I'm enclosing a patch against the python 2.5 source. I've also added the above as a test case to the test_compiler.py file. regards, -Kirat ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-27 17:43 Message: Logged In: YES user_id=849994 Originator: NO Thanks for the patch, this is fixed now in rev. 53575, 53576 (2.5). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638243&group_id=5470 From noreply at sourceforge.net Sat Jan 27 19:00:25 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 27 Jan 2007 10:00:25 -0800 Subject: [Patches] [ python-Patches-1634778 ] Add aliases for latin7/9/10 charsets Message-ID: Patches item #1634778, was opened at 2007-01-13 17:39 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.5 >Status: Closed >Resolution: Accepted Priority: 5 Private: No Submitted By: Christoph Zwerschke (cito) >Assigned to: Georg Brandl (gbrandl) Summary: Add aliases for latin7/9/10 charsets Initial Comment: This patch adds the latin-7, latin-9 and latin-10 aliases in some places where they were missing (see http://mail.python.org/pipermail/python-list/2006-December/416921.html). ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2007-01-27 18:00 Message: Logged In: YES user_id=849994 Originator: NO Committed in rev. 53578. I don't think this is backportable, since it adds a new "feature" -- referring to iso8859-15 by "latin9" for example. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1634778&group_id=5470 From noreply at sourceforge.net Sat Jan 27 20:10:02 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 27 Jan 2007 11:10:02 -0800 Subject: [Patches] [ python-Patches-1641790 ] logging leaks loggers Message-ID: Patches item #1641790, was opened at 2007-01-22 09:00 Message generated for change (Comment added) made by josiahcarlson You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Closed Resolution: Invalid Priority: 5 Private: No Submitted By: TH (therve) Assigned to: Vinay Sajip (vsajip) Summary: logging leaks loggers Initial Comment: In our application, we used to create a logger per client (to get IP/port automatically in the prefix). Unfortunately logging leaks loggers by keeping it into an internal dict (attribute loggerDict of Manager). Attached a patch using a weakref object, with a test. ---------------------------------------------------------------------- Comment By: Josiah Carlson (josiahcarlson) Date: 2007-01-27 11:10 Message: Logged In: YES user_id=341410 Originator: NO pitrou: you aren't understanding vsajip . Either factories or custom classes are *trivial* to write to "do the right thing". If I understand what you are doing, you have been doing... class connection: def __init__(self, ...): self.logger = logging.getLogger() def foo(self, ...): self.logger.info(message) #or equivalent debug, warning, etc. If you define the following class: class loggingwrapper(object): __slots__ = ['socketinfo'] def __init__(self, socketinfo): self.socketinfo = str(socketinfo) def __getattr__(self, attr): fcn = getattr(logger.getLogger(''), attr) def f2(msg, *args, **kwargs): return fcn("%s %s"%(self.socketinfo, str(msg)), *args, **kwargs) return f2 You can then do *almost* the exact same thing you were doing before... class connection: def __init__(self, ...): self.logger = loggingwrapper() #note the change def foo(self, ...): self.logger.info(message) And it will work as you want. ---------------------------------------------------------------------- Comment By: Antoine Pitrou (pitrou) Date: 2007-01-23 09:06 Message: Logged In: YES user_id=133955 Originator: NO Ok, since I was the one bitten by this bug I might as well add my 2 cents to the discussion. vsajip: > 1. Use the 'extra' parameter (added in Python 2.5). This is not practical. I want to define a prefix once and for all for all log messages that will be output in a given context. Explicitly adding a parameter to every log call does not help. (of course I can write wrappers to do this automatically - and that's what I ended up doing -, but then I must write 6 of them: one for each of "debug", "info", "warning", "error", "critical", and "exception"...) > 2. Use a connection-specific factory to obtain the logging message, or wrap the logging call on a connection-specific object which inserts the connection info. I don't even know what this means, but it sounds way overkill... > 3. Use something other than a literal string for the message - as documented, any object can be used as the message, and the logging system calls str() on it to get the actual text of the message. The "something" can be an instance of a class which Does The Right Thing. IIUC this means some explicitly machinery on each logging call, since I have to wrap every string in a constructor. Just like the "extra" parameter, with a slightly different flavour. It's disturbing that the logging module has so many powerful options but no way of conveniently doing simple things without creating memory leaks... ---------------------------------------------------------------------- Comment By: TH (therve) Date: 2007-01-23 00:54 Message: Logged In: YES user_id=1038797 Originator: YES OK I understand the design. But it's not clear in the documentation that once you've called getLogger('id') the logger will live forever. It's especially problematic on long-running processes. It would be great to have at least a warning in the documentation about this feature. ---------------------------------------------------------------------- Comment By: Vinay Sajip (vsajip) Date: 2007-01-23 00:42 Message: Logged In: YES user_id=308438 Originator: NO This is not a leak - it's by design. You are not using best practice when you create a logger per client; the specific scenario of getting connection info in the logging message can currently be done in several ways, e.g. 1. Use the 'extra' parameter (added in Python 2.5). 2. Use a connection-specific factory to obtain the logging message, or wrap the logging call on a connection-specific object which inserts the connection info. 3. Use something other than a literal string for the message - as documented, any object can be used as the message, and the logging system calls str() on it to get the actual text of the message. The "something" can be an instance of a class which Does The Right Thing. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-22 20:47 Message: Logged In: YES user_id=33168 Originator: NO Vinay, can you provide some direction? Thanks. ---------------------------------------------------------------------- Comment By: TH (therve) Date: 2007-01-22 09:09 Message: Logged In: YES user_id=1038797 Originator: YES Looking at the documentation, it seems keeping it is mandatory because you must get the same instance with getLogger. Maybe it'd need a documented way to remove from the dict, though. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641790&group_id=5470 From noreply at sourceforge.net Sun Jan 28 03:48:24 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sat, 27 Jan 2007 18:48:24 -0800 Subject: [Patches] [ python-Patches-1641544 ] rlcompleter tab completion in pdb Message-ID: Patches item #1641544, was opened at 2007-01-22 06:52 Message generated for change (Comment added) made by rockyb You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Stephen Emslie (stephenemslie) Assigned to: Nobody/Anonymous (nobody) Summary: rlcompleter tab completion in pdb Initial Comment: By default, Pdb and other instances of Cmd complete names for their commands. However in the context of pdb, I think it is more useful to complete identifiers and keywords in its current scope than to complete names of commands (most of which have single letter abbreviations). I believe this makes pdb a far more usable introspection tool. I have discussed this proposal on the python-ideas list: http://mail.python.org/pipermail/python-ideas/2007-January/000084.html This patch implements the following: - creates an rlcompleter instance on Pdb if readline is available - adds a 'complete' method to the Pdb class. The only difference with rlcompleter's default behaviour is that is also updates rlcompleter's namespace to reflect the current local and global namespace, which is necessary because pdb changes scope as it steps through a program This is a patch against python/Lib/pdb.py rev. 51745 ---------------------------------------------------------------------- Comment By: Rocky Bernstein (rockyb) Date: 2007-01-27 21:48 Message: Logged In: YES user_id=158581 Originator: NO I experimented with this a little in the pydb variant (http://bashdb.sf.net/pydb). Some observations. First, one can include the debugger commands into the namespace without too much trouble. See what's checked into CVS for pydb; In particular look at the complete method of pydbbdb. (Personally, I think adding debugger commands to the list of completions is a little more honest.) The second problem I have is that completion is not all that sensitive to the preceding context. If the line begins "step" or "1 + ", is it really correct to list all valid symbols? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1641544&group_id=5470 From noreply at sourceforge.net Sun Jan 28 15:21:49 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Sun, 28 Jan 2007 06:21:49 -0800 Subject: [Patches] [ python-Patches-1646432 ] ConfigParser getboolean() consistency Message-ID: Patches item #1646432, was opened at 2007-01-28 16:21 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1646432&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tal Einat (taleinat) Assigned to: Nobody/Anonymous (nobody) Summary: ConfigParser getboolean() consistency Initial Comment: Minor code change - made getboolean() implementation more consistent with other get...() methods. (i.e. uses _get) (functionality unchanged) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1646432&group_id=5470 From noreply at sourceforge.net Mon Jan 29 09:41:08 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 00:41:08 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 16:13 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 09:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 20:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 19:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 19:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 18:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 23:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 17:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 04:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 16:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 01:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Mon Jan 29 12:07:03 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 03:07:03 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 15:13 Message generated for change (Comment added) made by gustavo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 11:07 Message: Logged In: YES user_id=908 Originator: YES But if you think about it, support for other cases have to be extensions of this patch. In an async handler it's not safe to do about anything. The current framework is not async safe, it just happens to work most of the time. If we use pthread_kill we will start to enter platform-specific code; what will happen in systems without POSIX threads? What signal do we use to wake up the main thread? Do system calls that receive signals return EINTR for this platform or not (can we guarantee it always happens)? Which one is the main thread anyway? In any case, anything we want to do can be layered on top of the Py_signal_pipe API in a very safe way, because reading from a pipe is decoupled from the async handler, therefore this handler is allowed to safely do anything it wants, like pthread_kill. But IMHO that part should be left out of Python; let the frameworks do it themselves. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 08:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 19:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 18:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 18:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 17:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 22:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 16:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 03:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 14:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 00:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Mon Jan 29 13:37:14 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 04:37:14 -0800 Subject: [Patches] [ python-Patches-1633807 ] from __future__ import print_function Message-ID: Patches item #1633807, was opened at 2007-01-12 18:13 Message generated for change (Comment added) made by anthonybaxter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Nobody/Anonymous (nobody) Summary: from __future__ import print_function Initial Comment: This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!) The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0 Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. ---------------------------------------------------------------------- >Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-29 23:37 Message: Logged In: YES user_id=29957 Originator: YES Attached version 3 of the patch. I've added an '#if 0'd warning in ast.c - for instance, when enabled, you get ./setup.py:1336: SyntaxWarning: print no longer a statement in Py3.0 I'll make a new version of a -W py3k patch that enables this as well. I've made the other cleanup suggested by twouters. I'm not clear on the best way to do the tests for this - the from __future__ needs to be at the start of the file. My concern is that anything that tries to compile the whole test file with this under a previous version will choke and die on the print-as-function. Not sure if this is a hugely bad problem or not. Docs will follow once I bother wrapping my head around LaTeX and figuring out the best way to do the docs. I'm guessing we need a note in ref/ref6.tex in the section on the print statement, another bit in the same file in the subsection on Future statements, and something in lib/libbltin.tex. Did I miss anywhere? In current 3.0, the builtin is called Print, not print. Is there a reason for this? Is it just a matter of updating all the tests and ripping out the support for the print statement and the related opcodes? If so, I'll tackle that next. Doing this does mean that the docs and the stdlib and the tests will all need a huge amount of updating, and it will make merging from the trunk to the p3yk branch much more painful. While I'm in the vague area - why is PRINT_ITEM inlined in ceval.c? Couldn't it be punted out to a separate function, making the main switch statement just that little bit smaller? I can't imagine that making 'print' that little tiny bit faster is actually worthwhile, compared to shrinking the main switch statement. except E as V, I'll look at later for a different patch. My tree is already getting quite cluttered already with uncommitted patches :-) File Added: print_function.patch3 ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2007-01-18 02:58 Message: Logged In: YES user_id=34209 Originator: NO You seem to have '#if 0'ed-out some code related to the with/as-statement warnings; I suggest just removing them. Since you're in this code now, it might make sense to provide a commented out warning about the use of the print statement, so we won't have to figure it out later (in Python 2.9 or when we add -Wp3yk.) It needs a test, and probably a doc change somewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-18 02:24 Message: Logged In: YES user_id=6380 Originator: NO I don't think we need to do anything special for exec, as the exec(s, locals, globals) syntax is already (still :-) supported in 2.x with identical semantics as in 3.0. except E as V *syntax* can go in without a future stmt; and (only when that syntax is used) it should also enforce the new semantics (V must be a simple name and is deleted at the end of the except clause). I think Anthony's patch is a great idea, but I'll refrain from reviewing it. I'd say "just do it". :-) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 18:42 Message: Logged In: YES user_id=33168 Originator: NO Guido, this is the patch I was talking about wrt supporting a print function in 2.6. exec could get similar treatment. You mentioned in mail that things like except E as V: can go in without a future stmt. I agree. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-12 18:31 Message: Logged In: YES user_id=29957 Originator: YES Updated version of patch - fixes interactive mode, adds builtins.print File Added: print_function.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 From noreply at sourceforge.net Mon Jan 29 17:57:32 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 08:57:32 -0800 Subject: [Patches] [ python-Patches-1615158 ] POSIX capabilities support Message-ID: Patches item #1615158, was opened at 2006-12-13 18:10 Message generated for change (Comment added) made by gj0aqzda You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Matt Kern (gj0aqzda) Assigned to: Nobody/Anonymous (nobody) Summary: POSIX capabilities support Initial Comment: Attached is a patch which adds POSIX capabilities support. The following API functions are supported: * cap_clear * cap_copy_ext * cap_dup * cap_from_text * cap_get_flag * cap_get_proc * cap_init * cap_set_flag * cap_set_proc * cap_size * cap_to_text The following API function is supported, but is broken with certain versions of libcap (I am running debian testing's libcap1, version 1.10-14, which has an issue; I have reported this upstream): * cap_copy_int The following API functions are in there as stubs, but currently are not compiled. I need access to a machine to test these. I will probably add autoconf tests for availability of these functions in due course: * cap_get_fd * cap_get_file * cap_set_fd * cap_set_file The patch includes diffs to configure. My autoconf is however at a different revision to that used on the python trunk. You may want to re-autoconf configure.in. I've added a few API tests to test_posix.py. ---------------------------------------------------------------------- >Comment By: Matt Kern (gj0aqzda) Date: 2007-01-29 16:57 Message: Logged In: YES user_id=1667774 Originator: YES No news on these patches in a while. To summarise, the patches are ready to go in. The issues surrounding cap_copy_int(), cap_get_*() and cap_set_*() are pretty minor. The vast majority of uses will be of the cap_get_proc(), cap_set_flag(), cap_set_proc() variety. I am not trying to hassle you; I know you don't have enough time to get through everything. However, I'll hang fire on future development of stuff that I, personally, am not going to use, until I know when/if these patches are going to go in. ---------------------------------------------------------------------- Comment By: Matt Kern (gj0aqzda) Date: 2006-12-19 10:48 Message: Logged In: YES user_id=1667774 Originator: YES I've attached a documentation patch, which should be applied in addition to the base patch. File Added: patch-svn-doc.diff ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-12-16 13:25 Message: Logged In: YES user_id=849994 Originator: NO (If you don't want to write LaTeX, it's enough to write the docs in plaintext, there are a few volunteers who will convert it appropriately.) ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-16 12:28 Message: Logged In: YES user_id=21627 Originator: NO Can you please provide documentation changes as well? ---------------------------------------------------------------------- Comment By: Matt Kern (gj0aqzda) Date: 2006-12-13 18:12 Message: Logged In: YES user_id=1667774 Originator: YES I should further add that I have implemented the following API calls as methods of the new CapabilityState object in addition to the standard functions: * cap_clear * cap_copy_ext * cap_dup * cap_get_flag * cap_set_flag * cap_set_proc * cap_size * cap_to_text ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470 From noreply at sourceforge.net Mon Jan 29 18:07:29 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 09:07:29 -0800 Subject: [Patches] [ python-Patches-1633807 ] from __future__ import print_function Message-ID: Patches item #1633807, was opened at 2007-01-12 02:13 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Anthony Baxter (anthonybaxter) Assigned to: Nobody/Anonymous (nobody) Summary: from __future__ import print_function Initial Comment: This was done partly as a learning exercise, partly just as a vague idea that might prove to be practical (chatting with Neal at the time, but all blame is with me, not him!) The following adds 'from __future__ import print_function' to 2.x. When this is enabled, 'print' is no longer a statement. Combined with copying bltinmodule.c:builtin_print() from the p3yk trunk, this should give some compatibility options for 2.6 <-> 3.0 Note that for some reason I don't fully understand, this doesn't work in interactive mode. For some reason, in interactive mode, the parser flags get reset for each line. Wah. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2007-01-29 12:07 Message: Logged In: YES user_id=80475 Originator: NO Instead of __future__ imports, it would be better to put all of this Py3.0 stuff in a single compatability module and keep the rest of Py2.x as clean as possible. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-29 07:37 Message: Logged In: YES user_id=29957 Originator: YES Attached version 3 of the patch. I've added an '#if 0'd warning in ast.c - for instance, when enabled, you get ./setup.py:1336: SyntaxWarning: print no longer a statement in Py3.0 I'll make a new version of a -W py3k patch that enables this as well. I've made the other cleanup suggested by twouters. I'm not clear on the best way to do the tests for this - the from __future__ needs to be at the start of the file. My concern is that anything that tries to compile the whole test file with this under a previous version will choke and die on the print-as-function. Not sure if this is a hugely bad problem or not. Docs will follow once I bother wrapping my head around LaTeX and figuring out the best way to do the docs. I'm guessing we need a note in ref/ref6.tex in the section on the print statement, another bit in the same file in the subsection on Future statements, and something in lib/libbltin.tex. Did I miss anywhere? In current 3.0, the builtin is called Print, not print. Is there a reason for this? Is it just a matter of updating all the tests and ripping out the support for the print statement and the related opcodes? If so, I'll tackle that next. Doing this does mean that the docs and the stdlib and the tests will all need a huge amount of updating, and it will make merging from the trunk to the p3yk branch much more painful. While I'm in the vague area - why is PRINT_ITEM inlined in ceval.c? Couldn't it be punted out to a separate function, making the main switch statement just that little bit smaller? I can't imagine that making 'print' that little tiny bit faster is actually worthwhile, compared to shrinking the main switch statement. except E as V, I'll look at later for a different patch. My tree is already getting quite cluttered already with uncommitted patches :-) File Added: print_function.patch3 ---------------------------------------------------------------------- Comment By: Thomas Wouters (twouters) Date: 2007-01-17 10:58 Message: Logged In: YES user_id=34209 Originator: NO You seem to have '#if 0'ed-out some code related to the with/as-statement warnings; I suggest just removing them. Since you're in this code now, it might make sense to provide a commented out warning about the use of the print statement, so we won't have to figure it out later (in Python 2.9 or when we add -Wp3yk.) It needs a test, and probably a doc change somewhere. ---------------------------------------------------------------------- Comment By: Guido van Rossum (gvanrossum) Date: 2007-01-17 10:24 Message: Logged In: YES user_id=6380 Originator: NO I don't think we need to do anything special for exec, as the exec(s, locals, globals) syntax is already (still :-) supported in 2.x with identical semantics as in 3.0. except E as V *syntax* can go in without a future stmt; and (only when that syntax is used) it should also enforce the new semantics (V must be a simple name and is deleted at the end of the except clause). I think Anthony's patch is a great idea, but I'll refrain from reviewing it. I'd say "just do it". :-) ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-01-17 02:42 Message: Logged In: YES user_id=33168 Originator: NO Guido, this is the patch I was talking about wrt supporting a print function in 2.6. exec could get similar treatment. You mentioned in mail that things like except E as V: can go in without a future stmt. I agree. ---------------------------------------------------------------------- Comment By: Anthony Baxter (anthonybaxter) Date: 2007-01-12 02:31 Message: Logged In: YES user_id=29957 Originator: YES Updated version of patch - fixes interactive mode, adds builtins.print File Added: print_function.patch ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1633807&group_id=5470 From noreply at sourceforge.net Mon Jan 29 19:36:40 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 10:36:40 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 16:13 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 19:36 Message: Logged In: YES user_id=21627 Originator: NO Can you please explain in what sense the current framework isn't "async safe"? You might be referring to "async-signal-safe functions", which is a term specified by POSIX, referring to functions that may be called in a signal handler. The Python signal handler, signal_handler, calls these functions: * getpid * Py_AddPendingCall * PyOS_setsig ** sigemptyset ** sigaction AFAICT, this is the complete list of functions called in a signal handler. Of these, only getpid, sigemptyset, and sigaction are library functions, and they are all specified as async-signal safe. So the current implementation is async-signal safe. Usage of pthread_kill wouldn't make it more platform-specific than your patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So both changes work on a POSIX system, and neither change would be portable if all you have is standard C. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 12:07 Message: Logged In: YES user_id=908 Originator: YES But if you think about it, support for other cases have to be extensions of this patch. In an async handler it's not safe to do about anything. The current framework is not async safe, it just happens to work most of the time. If we use pthread_kill we will start to enter platform-specific code; what will happen in systems without POSIX threads? What signal do we use to wake up the main thread? Do system calls that receive signals return EINTR for this platform or not (can we guarantee it always happens)? Which one is the main thread anyway? In any case, anything we want to do can be layered on top of the Py_signal_pipe API in a very safe way, because reading from a pipe is decoupled from the async handler, therefore this handler is allowed to safely do anything it wants, like pthread_kill. But IMHO that part should be left out of Python; let the frameworks do it themselves. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 09:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 20:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 19:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 19:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 18:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 23:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 17:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 04:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 16:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 01:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Mon Jan 29 19:53:08 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 10:53:08 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 15:13 Message generated for change (Comment added) made by gustavo You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 18:53 Message: Logged In: YES user_id=908 Originator: YES Py_AddPendingCall is not async safe. It's obvious looking at the code, and it even says so in a comment: /* XXX Begin critical section */ /* XXX If you want this to be safe against nested XXX asynchronous calls, you'll have to work harder! */ ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 18:36 Message: Logged In: YES user_id=21627 Originator: NO Can you please explain in what sense the current framework isn't "async safe"? You might be referring to "async-signal-safe functions", which is a term specified by POSIX, referring to functions that may be called in a signal handler. The Python signal handler, signal_handler, calls these functions: * getpid * Py_AddPendingCall * PyOS_setsig ** sigemptyset ** sigaction AFAICT, this is the complete list of functions called in a signal handler. Of these, only getpid, sigemptyset, and sigaction are library functions, and they are all specified as async-signal safe. So the current implementation is async-signal safe. Usage of pthread_kill wouldn't make it more platform-specific than your patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So both changes work on a POSIX system, and neither change would be portable if all you have is standard C. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 11:07 Message: Logged In: YES user_id=908 Originator: YES But if you think about it, support for other cases have to be extensions of this patch. In an async handler it's not safe to do about anything. The current framework is not async safe, it just happens to work most of the time. If we use pthread_kill we will start to enter platform-specific code; what will happen in systems without POSIX threads? What signal do we use to wake up the main thread? Do system calls that receive signals return EINTR for this platform or not (can we guarantee it always happens)? Which one is the main thread anyway? In any case, anything we want to do can be layered on top of the Py_signal_pipe API in a very safe way, because reading from a pipe is decoupled from the async handler, therefore this handler is allowed to safely do anything it wants, like pthread_kill. But IMHO that part should be left out of Python; let the frameworks do it themselves. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 08:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 19:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 18:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 18:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 17:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 22:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 16:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 03:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 14:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 00:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Mon Jan 29 20:30:04 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 11:30:04 -0800 Subject: [Patches] [ python-Patches-1615158 ] POSIX capabilities support Message-ID: Patches item #1615158, was opened at 2006-12-13 19:10 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Matt Kern (gj0aqzda) Assigned to: Nobody/Anonymous (nobody) Summary: POSIX capabilities support Initial Comment: Attached is a patch which adds POSIX capabilities support. The following API functions are supported: * cap_clear * cap_copy_ext * cap_dup * cap_from_text * cap_get_flag * cap_get_proc * cap_init * cap_set_flag * cap_set_proc * cap_size * cap_to_text The following API function is supported, but is broken with certain versions of libcap (I am running debian testing's libcap1, version 1.10-14, which has an issue; I have reported this upstream): * cap_copy_int The following API functions are in there as stubs, but currently are not compiled. I need access to a machine to test these. I will probably add autoconf tests for availability of these functions in due course: * cap_get_fd * cap_get_file * cap_set_fd * cap_set_file The patch includes diffs to configure. My autoconf is however at a different revision to that used on the python trunk. You may want to re-autoconf configure.in. I've added a few API tests to test_posix.py. ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 20:30 Message: Logged In: YES user_id=21627 Originator: NO The patch cannot go in in its current form (I started applying it, but then found that I just can't do it). It contains conditional, commented out code. Either the code is correct, then it should be added, or it is incorrect, in which case it should be removed entirely. There shouldn't be any work-in-progress code in the Python repository whatsoever. This refers to both the if 0 blocks (which I thought I can safely delete), as well as commented-out entries in CapabilityStateMethods (for which I didn't know what to do). So while you are revising it, I have a few remarks: - you can safely omit the generated configure changes from the patch - I will regenerate them, anyway. - please follow the alphabet in the header files in configure.in (bsdtty.h < capabilities.h) - please don't expose method on objects on which they aren't methods. E.g. cap_clear is available both as a method and a module-level function; that can't be both right (there should be one way to do it) Following the socket API, I think offering these as methods is reasonable - try avoiding the extra copy in copy_ext (copying directly into the string). If you keep malloc calls, don't return NULL without setting a Python exception. - use the "s" format for copy_int and from_text - consider using booleans for [gs]et_flags ---------------------------------------------------------------------- Comment By: Matt Kern (gj0aqzda) Date: 2007-01-29 17:57 Message: Logged In: YES user_id=1667774 Originator: YES No news on these patches in a while. To summarise, the patches are ready to go in. The issues surrounding cap_copy_int(), cap_get_*() and cap_set_*() are pretty minor. The vast majority of uses will be of the cap_get_proc(), cap_set_flag(), cap_set_proc() variety. I am not trying to hassle you; I know you don't have enough time to get through everything. However, I'll hang fire on future development of stuff that I, personally, am not going to use, until I know when/if these patches are going to go in. ---------------------------------------------------------------------- Comment By: Matt Kern (gj0aqzda) Date: 2006-12-19 11:48 Message: Logged In: YES user_id=1667774 Originator: YES I've attached a documentation patch, which should be applied in addition to the base patch. File Added: patch-svn-doc.diff ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-12-16 14:25 Message: Logged In: YES user_id=849994 Originator: NO (If you don't want to write LaTeX, it's enough to write the docs in plaintext, there are a few volunteers who will convert it appropriately.) ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2006-12-16 13:28 Message: Logged In: YES user_id=21627 Originator: NO Can you please provide documentation changes as well? ---------------------------------------------------------------------- Comment By: Matt Kern (gj0aqzda) Date: 2006-12-13 19:12 Message: Logged In: YES user_id=1667774 Originator: YES I should further add that I have implemented the following API calls as methods of the new CapabilityState object in addition to the standard functions: * cap_clear * cap_copy_ext * cap_dup * cap_get_flag * cap_set_flag * cap_set_proc * cap_size * cap_to_text ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1615158&group_id=5470 From noreply at sourceforge.net Mon Jan 29 21:01:27 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 12:01:27 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 16:13 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 21:01 Message: Logged In: YES user_id=21627 Originator: NO I see. I think this can be fixed fairly easily: install the signal handlers with sigaction, and prevent any nested delivery of signals through sa_mask. Then, no two signal handlers will get invoked simultaneously. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 19:53 Message: Logged In: YES user_id=908 Originator: YES Py_AddPendingCall is not async safe. It's obvious looking at the code, and it even says so in a comment: /* XXX Begin critical section */ /* XXX If you want this to be safe against nested XXX asynchronous calls, you'll have to work harder! */ ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 19:36 Message: Logged In: YES user_id=21627 Originator: NO Can you please explain in what sense the current framework isn't "async safe"? You might be referring to "async-signal-safe functions", which is a term specified by POSIX, referring to functions that may be called in a signal handler. The Python signal handler, signal_handler, calls these functions: * getpid * Py_AddPendingCall * PyOS_setsig ** sigemptyset ** sigaction AFAICT, this is the complete list of functions called in a signal handler. Of these, only getpid, sigemptyset, and sigaction are library functions, and they are all specified as async-signal safe. So the current implementation is async-signal safe. Usage of pthread_kill wouldn't make it more platform-specific than your patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So both changes work on a POSIX system, and neither change would be portable if all you have is standard C. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 12:07 Message: Logged In: YES user_id=908 Originator: YES But if you think about it, support for other cases have to be extensions of this patch. In an async handler it's not safe to do about anything. The current framework is not async safe, it just happens to work most of the time. If we use pthread_kill we will start to enter platform-specific code; what will happen in systems without POSIX threads? What signal do we use to wake up the main thread? Do system calls that receive signals return EINTR for this platform or not (can we guarantee it always happens)? Which one is the main thread anyway? In any case, anything we want to do can be layered on top of the Py_signal_pipe API in a very safe way, because reading from a pipe is decoupled from the async handler, therefore this handler is allowed to safely do anything it wants, like pthread_kill. But IMHO that part should be left out of Python; let the frameworks do it themselves. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 09:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 20:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 19:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 19:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 18:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 23:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 17:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 04:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 16:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 01:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Mon Jan 29 23:11:16 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 14:11:16 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 08:13 Message generated for change (Comment added) made by rhamphoryncus You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-29 15:11 Message: Logged In: YES user_id=12364 Originator: NO As far as I can tell, the sig_mask argument of sigaction only applies to the thread in which the signal handler gets called. If you have multiple threads you could still have one signal handler running per thread. http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 13:01 Message: Logged In: YES user_id=21627 Originator: NO I see. I think this can be fixed fairly easily: install the signal handlers with sigaction, and prevent any nested delivery of signals through sa_mask. Then, no two signal handlers will get invoked simultaneously. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 11:53 Message: Logged In: YES user_id=908 Originator: YES Py_AddPendingCall is not async safe. It's obvious looking at the code, and it even says so in a comment: /* XXX Begin critical section */ /* XXX If you want this to be safe against nested XXX asynchronous calls, you'll have to work harder! */ ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 11:36 Message: Logged In: YES user_id=21627 Originator: NO Can you please explain in what sense the current framework isn't "async safe"? You might be referring to "async-signal-safe functions", which is a term specified by POSIX, referring to functions that may be called in a signal handler. The Python signal handler, signal_handler, calls these functions: * getpid * Py_AddPendingCall * PyOS_setsig ** sigemptyset ** sigaction AFAICT, this is the complete list of functions called in a signal handler. Of these, only getpid, sigemptyset, and sigaction are library functions, and they are all specified as async-signal safe. So the current implementation is async-signal safe. Usage of pthread_kill wouldn't make it more platform-specific than your patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So both changes work on a POSIX system, and neither change would be portable if all you have is standard C. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 04:07 Message: Logged In: YES user_id=908 Originator: YES But if you think about it, support for other cases have to be extensions of this patch. In an async handler it's not safe to do about anything. The current framework is not async safe, it just happens to work most of the time. If we use pthread_kill we will start to enter platform-specific code; what will happen in systems without POSIX threads? What signal do we use to wake up the main thread? Do system calls that receive signals return EINTR for this platform or not (can we guarantee it always happens)? Which one is the main thread anyway? In any case, anything we want to do can be layered on top of the Py_signal_pipe API in a very safe way, because reading from a pipe is decoupled from the async handler, therefore this handler is allowed to safely do anything it wants, like pthread_kill. But IMHO that part should be left out of Python; let the frameworks do it themselves. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 01:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 12:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 11:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 11:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 10:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 15:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 09:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 20:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 08:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 07:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-26 17:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Mon Jan 29 23:25:21 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 14:25:21 -0800 Subject: [Patches] [ python-Patches-1647484 ] gzip.GzipFile has no name attribute Message-ID: Patches item #1647484, was opened at 2007-01-29 23:25 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1647484&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Lars Gust?bel (gustaebel) Assigned to: Nobody/Anonymous (nobody) Summary: gzip.GzipFile has no name attribute Initial Comment: The gzip.GzipFile object uses a filename instead of a name attribute. This deviates from the standard practice and the interface described in "3.9 File Objects" and seems unnecessary. Attached patch changes this but still leaves the filename attribute as a property that emits a DeprecationWarning. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1647484&group_id=5470 From noreply at sourceforge.net Mon Jan 29 23:25:42 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 14:25:42 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 16:13 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- >Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 23:25 Message: Logged In: YES user_id=21627 Originator: NO Right. To prevent the simultaneous invocation of Py_AddPendingCall from multiple threads, two alternatives are possible: a) protect the routine with a thread mutex, if threading is available b) use pthread_kill in threads other than the main thread (as I proposed earlier); those other threads then wouldn't call Py_AddPendingCall anymore ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-29 23:11 Message: Logged In: YES user_id=12364 Originator: NO As far as I can tell, the sig_mask argument of sigaction only applies to the thread in which the signal handler gets called. If you have multiple threads you could still have one signal handler running per thread. http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 21:01 Message: Logged In: YES user_id=21627 Originator: NO I see. I think this can be fixed fairly easily: install the signal handlers with sigaction, and prevent any nested delivery of signals through sa_mask. Then, no two signal handlers will get invoked simultaneously. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 19:53 Message: Logged In: YES user_id=908 Originator: YES Py_AddPendingCall is not async safe. It's obvious looking at the code, and it even says so in a comment: /* XXX Begin critical section */ /* XXX If you want this to be safe against nested XXX asynchronous calls, you'll have to work harder! */ ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 19:36 Message: Logged In: YES user_id=21627 Originator: NO Can you please explain in what sense the current framework isn't "async safe"? You might be referring to "async-signal-safe functions", which is a term specified by POSIX, referring to functions that may be called in a signal handler. The Python signal handler, signal_handler, calls these functions: * getpid * Py_AddPendingCall * PyOS_setsig ** sigemptyset ** sigaction AFAICT, this is the complete list of functions called in a signal handler. Of these, only getpid, sigemptyset, and sigaction are library functions, and they are all specified as async-signal safe. So the current implementation is async-signal safe. Usage of pthread_kill wouldn't make it more platform-specific than your patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So both changes work on a POSIX system, and neither change would be portable if all you have is standard C. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 12:07 Message: Logged In: YES user_id=908 Originator: YES But if you think about it, support for other cases have to be extensions of this patch. In an async handler it's not safe to do about anything. The current framework is not async safe, it just happens to work most of the time. If we use pthread_kill we will start to enter platform-specific code; what will happen in systems without POSIX threads? What signal do we use to wake up the main thread? Do system calls that receive signals return EINTR for this platform or not (can we guarantee it always happens)? Which one is the main thread anyway? In any case, anything we want to do can be layered on top of the Py_signal_pipe API in a very safe way, because reading from a pipe is decoupled from the async handler, therefore this handler is allowed to safely do anything it wants, like pthread_kill. But IMHO that part should be left out of Python; let the frameworks do it themselves. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 09:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 20:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 19:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 19:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 18:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 23:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 17:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-28 04:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 16:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 15:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 01:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Tue Jan 30 00:59:39 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 15:59:39 -0800 Subject: [Patches] [ python-Patches-1564547 ] Py_signal_pipe Message-ID: Patches item #1564547, was opened at 2006-09-24 08:13 Message generated for change (Comment added) made by rhamphoryncus You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Core (C code) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Gustavo J. A. M. Carneiro (gustavo) Assigned to: Nobody/Anonymous (nobody) Summary: Py_signal_pipe Initial Comment: Problem: how to wakeup extension modules running poll() so that they can let python check for signals. Solution: use a pipe to communicate between signal handlers and main thread. The read end of the pipe can then be monitored by poll/select for input events and wake up poll(). As a side benefit, it avoids the usage of Py_AddPendingCall / Py_MakePendingCalls, which are patently not "async safe". All explained in this thread: http://mail.python.org/pipermail/python-dev/2006-September/068569.html ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-29 16:59 Message: Logged In: YES user_id=12364 Originator: NO Unfortunately, neither the mutex functions nor pthread_kill() are listed as async-signal-safe: http://www.opengroup.org/onlinepubs/009695399/functions/xsh_chap02_04.html Personally, I'd be just as happy to raise an exception if an attempt is made to import both signal and threading: doing it safely and reliably is just too difficult, so we shouldn't promote a false sense of security. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 15:25 Message: Logged In: YES user_id=21627 Originator: NO Right. To prevent the simultaneous invocation of Py_AddPendingCall from multiple threads, two alternatives are possible: a) protect the routine with a thread mutex, if threading is available b) use pthread_kill in threads other than the main thread (as I proposed earlier); those other threads then wouldn't call Py_AddPendingCall anymore ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-29 15:11 Message: Logged In: YES user_id=12364 Originator: NO As far as I can tell, the sig_mask argument of sigaction only applies to the thread in which the signal handler gets called. If you have multiple threads you could still have one signal handler running per thread. http://www.opengroup.org/onlinepubs/009695399/functions/sigaction.html ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 13:01 Message: Logged In: YES user_id=21627 Originator: NO I see. I think this can be fixed fairly easily: install the signal handlers with sigaction, and prevent any nested delivery of signals through sa_mask. Then, no two signal handlers will get invoked simultaneously. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 11:53 Message: Logged In: YES user_id=908 Originator: YES Py_AddPendingCall is not async safe. It's obvious looking at the code, and it even says so in a comment: /* XXX Begin critical section */ /* XXX If you want this to be safe against nested XXX asynchronous calls, you'll have to work harder! */ ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 11:36 Message: Logged In: YES user_id=21627 Originator: NO Can you please explain in what sense the current framework isn't "async safe"? You might be referring to "async-signal-safe functions", which is a term specified by POSIX, referring to functions that may be called in a signal handler. The Python signal handler, signal_handler, calls these functions: * getpid * Py_AddPendingCall * PyOS_setsig ** sigemptyset ** sigaction AFAICT, this is the complete list of functions called in a signal handler. Of these, only getpid, sigemptyset, and sigaction are library functions, and they are all specified as async-signal safe. So the current implementation is async-signal safe. Usage of pthread_kill wouldn't make it more platform-specific than your patch. pthread_kill is part of the POSIX standard, and so is pipe(2). So both changes work on a POSIX system, and neither change would be portable if all you have is standard C. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-29 04:07 Message: Logged In: YES user_id=908 Originator: YES But if you think about it, support for other cases have to be extensions of this patch. In an async handler it's not safe to do about anything. The current framework is not async safe, it just happens to work most of the time. If we use pthread_kill we will start to enter platform-specific code; what will happen in systems without POSIX threads? What signal do we use to wake up the main thread? Do system calls that receive signals return EINTR for this platform or not (can we guarantee it always happens)? Which one is the main thread anyway? In any case, anything we want to do can be layered on top of the Py_signal_pipe API in a very safe way, because reading from a pipe is decoupled from the async handler, therefore this handler is allowed to safely do anything it wants, like pthread_kill. But IMHO that part should be left out of Python; let the frameworks do it themselves. ---------------------------------------------------------------------- Comment By: Martin v. L?wis (loewis) Date: 2007-01-29 01:41 Message: Logged In: YES user_id=21627 Originator: NO I'm -1 on this patch. The introduction of a pipe makes it essentially gtk-specific: It will only work with gtk (for a while, until other frameworks catch up - which may take years), and it will only wake up a gtk thread that is in the gtk poll call. It fails to support cases where the main thread blocks in a different blocking call (i.e. neither select nor poll). I think a better mechanism is needed to support that case, e.g. by waking up the main thread with pthread_kill. ---------------------------------------------------------------------- Comment By: Jp Calderone (kuran) Date: 2007-01-25 12:22 Message: Logged In: YES user_id=366566 Originator: NO The attached patch also fixes a bug in the order in which signal handlers are run. Previously, they would be run in numerically ascending signal number order. With the patch attached, they will be run in the order they are processed by Python. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2007-01-25 11:38 Message: Logged In: YES user_id=12364 Originator: NO gustavo, there's two patches attached and it's not entirely clear which one is current. Please delete the older one. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 11:11 Message: Logged In: YES user_id=908 Originator: YES File Added: python-signals.diff ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2007-01-25 10:57 Message: Logged In: YES user_id=908 Originator: YES Damn this SF bug tracker! ;( The patch I uploaded (yes, it was me, not anonymous) fixes some bugs and also fixes http://www.python.org/sf/1643738 ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-29 15:09 Message: Logged In: YES user_id=12364 I'm concerned about the interface to PyOS_InterruptOccurred(). The original version peeked ahead for only that signal, and handled it manually. No need to report errors. The new version will first call arbitrary python functions to handle any earlier signals, then an arbitrary python function for the interrupt itself, and then will not report any errors they produce. It may not even get to the interrupt, even if one is waiting. I'm not sure PyOS_InterruptOccurred() is called when arbitrary python code is acceptable. I suspect it should be dropped entierly, in favour of a more robust API. Otoh, some of it appears quite crufty. One version in intrcheck.c lacks a return statement, invoking undefined behavior in C. One other concern I have is that signalmodule.c should never been unloaded, if loaded via dlopen. A delayed signal handler may reference it indefinitely. However, I see no sane way to enforce this. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-28 09:31 Message: Logged In: YES user_id=908 > ...sizeof(char) will STILL return 1 in such a case... Even if sizeof(char) == 1, 'sizeof(signum_c)' is much more readable than just a plain '1'. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-27 20:50 Message: Logged In: YES user_id=12364 Any compiler where sizeof(char) != 1 is *deeply* broken. In C, a byte isn't always 8 bits (if it uses bits at all!). It's possible for a char to take (for instance) 32 bits, but sizeof(char) will STILL return 1 in such a case. A mention of this in the wild is here: http://lkml.org/lkml/1998/1/22/4 If you find a compiler that's broken, I'd love to hear about it. :) # error Too many signals to fit on an unsigned char! Should be "in", not "on" :) A comment in signal_handler() about ignoring the return value of write() may be good. initsignal() should avoid not replace Py_signal_pipe/Py_signal_pipe_w if called a second time (which is possible, right?). If so, it should probably not set them until after setting non-blocking mode. check_signals() should not call PyEval_CallObject(Handlers[signum].func, ...) if func is NULL, which may happen after finisignal() clears it. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 08:34 Message: Logged In: YES user_id=908 and of course this > * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. is correct, good catch. New patch uploaded. ---------------------------------------------------------------------- Comment By: Gustavo J. A. M. Carneiro (gustavo) Date: 2006-09-27 07:42 Message: Logged In: YES user_id=908 > * Needs documentation ... True, I'll try to add more documentation... > * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. I disagree. Creating an array of size UCHAR_MAX is just wasting memory. If you check the original python code, there's already fallback code to define NSIG if it's not already defined (if not defined, it could end up being defines as 64). > * In signal_hander() sizeof(signum_c) is inherently 1. ;) And? I occasionally hear horror stories of platforms where sizeof(char) != 1, I'm not taking any chances :) > * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. check_signals already bails out if that is the case. But in fact it bails out without setting the interrupt_occurred output parameter, so I fixed that. fcntl error checking... will work on it. ---------------------------------------------------------------------- Comment By: Adam Olsen (rhamphoryncus) Date: 2006-09-26 17:53 Message: Logged In: YES user_id=12364 I've looked over the patch, although I haven't tested it. I have the following suggestions: * Needs documentation explaining the signal weirdness (may drop signals, may delay indefinitely, new handlers may get signals intended for old, etc) * Needs to be explicit that users must only poll/select to check for readability of the pipe, NOT read from it * The comment for is_tripped refers to sigcheck(), which doesn't exist * I think we should be more paranoid about the range of possible signals. NSIG does not appear to be defined by SUSv2 (no clue about Posix). We should size the Handlers array to UCHAR_MAX and set any signals outside the range of 0..UCHAR_MAX to either 0 (null signal) or UCHAR_MAX. I'm not sure we should ever use NSIG. * In signal_hander() sizeof(signum_c) is inherently 1. ;) * The set_nonblock macro doesn't check for errors from fcntl(). I'm not sure it's worth having a macro for that anyway. * Needs some documentation of the assumptions about read()/write() being memory barriers. * In check_signals() sizeof(signum) is inherently 1. * There's a blank line with tabs near the end of check_signals() ;) * PyErr_SetInterrupt() should use a compile-time check for SIGINT being within 0..UCHAR_MAX, assuming NSIG is ripped out entierly. * PyErr_SetInterrupt() needs to set is_tripped after the call to write(), not before. * PyOS_InterruptOccurred() should probably still check that it's called from the main thread. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1564547&group_id=5470 From noreply at sourceforge.net Tue Jan 30 01:52:20 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 16:52:20 -0800 Subject: [Patches] [ python-Patches-1638033 ] Add httponly to Cookie module Message-ID: Patches item #1638033, was opened at 2007-01-17 20:07 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.6 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Arvin Schnell (arvins) Assigned to: Nobody/Anonymous (nobody) Summary: Add httponly to Cookie module Initial Comment: Add the Microsoft extension httponly to the Cookie module. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2007-01-30 00:52 Message: Logged In: YES user_id=261020 Originator: NO This is backwards-incompatible, no? The behaviour of Morsel.set() changes (disallowing key="httponly") hence the behaviour of BaseCookie.__setitem__ changes. Do you have a use case? ---------------------------------------------------------------------- Comment By: Arvin Schnell (arvins) Date: 2007-01-19 17:01 Message: Logged In: YES user_id=698939 Originator: YES Sure, I have added some documentation to the patch. File Added: python.diff ---------------------------------------------------------------------- Comment By: Jim Jewett (jimjjewett) Date: 2007-01-19 15:06 Message: Logged In: YES user_id=764593 Originator: NO The documentation change should say what the attribute does. (It requests the the cookie be hidden from javascript, and available only to http requests.) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1638033&group_id=5470 From noreply at sourceforge.net Tue Jan 30 02:34:40 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 17:34:40 -0800 Subject: [Patches] [ python-Patches-1508475 ] transparent gzip compression in liburl2 Message-ID: Patches item #1508475, was opened at 2006-06-19 09:59 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1508475&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Modules Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jakob Truelsen (antialize) Assigned to: Nobody/Anonymous (nobody) Summary: transparent gzip compression in liburl2 Initial Comment: Some webservers support gzipping things before sending them, this patch adds transparrent support for this in urllib2 (documentation http://www.http-compression.com/) This patach *requires* hash patch 914340 as a prerequirement as this enabels stream support in the gzip libary.. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2007-01-30 01:34 Message: Logged In: YES user_id=261020 Originator: NO Looks good. This needs tests and docs. As a new feature, this could not be released until Python 2.6. It would be nice to have support for managing content negotiation in general, but that wish isn't an obstacle to this patch. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1508475&group_id=5470 From noreply at sourceforge.net Tue Jan 30 03:15:41 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 18:15:41 -0800 Subject: [Patches] [ python-Patches-1550272 ] Add a test suite for unittest Message-ID: Patches item #1550272, was opened at 2006-09-01 03:44 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1550272&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Tests Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Collin Winter (collinwinter) Assigned to: Nobody/Anonymous (nobody) Summary: Add a test suite for unittest Initial Comment: This file replaces the current version Lib/test/test_unittest.py, which only contains a single test. The attached suite contains 128 tests for the mission-critical parts of unittest. A patch will follow shortly that fixes the bugs in unittest uncovered by this test suite. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2007-01-30 02:15 Message: Logged In: YES user_id=261020 Originator: NO Oh the irony. :) This is good stuff. I have not reviewed the whole patch, but sampling bits of it it looks fine. No great danger in committing this, so why not let's commit it? Of the following points, I think only the first should block commit of this patch. Any comments on that first point? 1. test_loadTestsFromName__module_not_loaded() and test_loadTestsFromNames__module_not_loaded() -- these may break in future, and may break e.g. only when running tests in random order, which is sometimes done when debugging obscure stuff. Better to introduce a module of your own in Lib/test that's guaranteed not to be loaded already -- maybe test_unittest_fodder.py . Still, that wouldn't help the case where somebody is running the tests in a loop, which would cause failures already (again, this is something people do as part of bug detection / removal). I don't know the import internals and I hear they're messy, but perhaps just del sys.modules[module_name] at the start of each of those two methods is at least an improvement over what they do now. 2. Would be helpful to list what remains to be tested (for example, there is no test of assertRaises) 3. Why no use of .assertRaises? 4. Would be nice to resolve some of the XXXes, but I realise that this may be difficult/impossible given the requirement for backwards-compatibility ---------------------------------------------------------------------- Comment By: Collin Winter (collinwinter) Date: 2006-09-01 03:52 Message: Logged In: YES user_id=1344176 That promised patch for unittest is #1550273. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1550272&group_id=5470 From noreply at sourceforge.net Tue Jan 30 03:32:04 2007 From: noreply at sourceforge.net (SourceForge.net) Date: Mon, 29 Jan 2007 18:32:04 -0800 Subject: [Patches] [ python-Patches-1486713 ] HTMLParser : A auto-tolerant parsing mode Message-ID: Patches item #1486713, was opened at 2006-05-11 18:19 Message generated for change (Comment added) made by jjlee You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1486713&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Library (Lib) Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: kxroberto (kxroberto) Assigned to: Nobody/Anonymous (nobody) Summary: HTMLParser : A auto-tolerant parsing mode Initial Comment: Changes: * Now allows missing spaces between attributes as its often seen on the web like this :